Every time a nvme bus/port is scanned and a new device is detected,
we want to call device_probe() as it will give us a chance to run
additional post-processings for some purposes.
In particular, support for creating partitions on a device will be added.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Simon Glass <sjg@chromium.org>
Add a function to disable the NVMe controller. This will be used
to let the driver for the NVMe storage integrated on Apple SoCs
shutdown the NVMe controller such we can shutdown the NVMe
IOP controller in a clean way afterwards before handing control
to the OS.
Signed-off-by: Mark Kettenis <kettenis@openbsd.org>
Reviewed-by: Simon Glass <sjg@chromium.org>
Tested on: Macbook Air M1
Tested-by: Simon Glass <sjg@chromium.org>
The NVMe storage controller integrated on Apple SoCs deviates
from the NVMe standard in two aspects. It uses a "linear"
submission queue and it integrates an NVMMU that needs to be
programmed for each NVMe command. Introduce driver ops such
that we can set up the linear submission queue and program the
NVMMU in the driver for this strange beast.
Signed-off-by: Mark Kettenis <kettenis@openbsd.org>
Reviewed-by: Simon Glass <sjg@chromium.org>
Tested on: Macbook Air M1
Tested-by: Simon Glass <sjg@chromium.org>
Apple SoCs have an integrated NVMe controller that isn't connected
over a PCIe bus. In preparation for adding support for this NVMe
controller, split out the PCI support into its own file. This file
is selected through a new CONFIG_NVME_PCI Kconfig option, so do
a wholesale replacement of CONFIG_NVME with CONFIG_NVME_PCI.
Signed-off-by: Mark Kettenis <kettenis@openbsd.org>
Reviewed-by: Simon Glass <sjg@chromium.org>
Tested on: Macbook Air M1
Tested-by: Simon Glass <sjg@chromium.org>
The current code invalidates the range after the read buffer since the
buffer pointer gets incremented in the read loop. Use a temporary
pointer to make sure we have a pristine pointer to invalidate the
correct memory range after read.
Fixes: 704e040a51 ("nvme: Apply cache operations on the DMA buffers")
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefan Agner <stefan@agner.ch>
A udevice's priv space is cleared in alloc_priv() in the DM core.
Don't do it again in its probe() routine.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
At present there is an offset of one added during the creation of
block device. This can be very confusing as we wanted to encode the
namespace id in the block device name but namespae id cannot be zero.
This changes to use the namespace id directly in the block device
name, eliminating the offset of one effectively.
Suggested-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
At present for each namespace there is a block device created for it.
There is no issue if the number of supported namespaces reported from
the NVMe device is only 1.
Since QEMU commit 7f0f1acedf15 ("hw/block/nvme: support multiple namespaces"),
the number of supported namespaces reported has been changed from 1
to 256, but not all of them are active namespaces. The actual active
one depends on the QEMU command line parameters. A common case is
that namespace 1 being active and all other 255 being inactive.
If a namespace is inactive, the namespace identify command returns a
zero filled data structure. We can use field NSZE (namespace size) to
decide whether a block device should be created for it.
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
At present the block device creation happens in the NVMe uclass
driver post_probe() phase. In preparation to support multiple
namespaces, we should issue namespace identify before creating
block devices but that touches the underlying hardware hence it
is not appropriate to do such in the uclass driver post_probe().
Let's move it to driver probe() phase instead.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
AQA (Admin Queue Attributes) register is a dword size with
lower word of ASQS, and higher word of ACQS.
The code set the variable aqa twice, but it is redundant.
Signed-off-by: Wesley Sheng <wesleyshenggit@sina.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Each prp is 8 bytes, calculate the number of prps
per page should just divide page size by 8
there is no need to minus 1
Signed-off-by: Wesley Sheng <wesleyshenggit@sina.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
writel() and co. already include the endian swap; doing the swap twice
is, er, unhelpful.
Tested on a P4080DS, which boots perfectly fine off NVMe with this.
Signed-off-by: David Lamparter <equinox@diac24.net>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
At the moment the nvme_get_features() and nvme_set_features() functions
carry a (somewhat misleading) comment about missing cache maintenance.
As it turns out, nvme_get_features() has no caller at all in the tree,
and nvme_set_features' only user doesn't use a DMA buffer.
Mention that in the comment, and leave some breadcrumbs for the future,
should those functions attract more users.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Michael Trimarchi <michael@amarulasolutions.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
At the moment nvme_read_completion_status() tries to invalidate a single
member of the cqes[] array, which is shady as just a single entry is
not cache line aligned.
The structure is dictated by hardware, and with 16 bytes is smaller than
any cache line we usually deal with. Also multiple entries need to be
consecutive in memory, so we can't pad them to cover a whole cache line.
As a consequence we can only always invalidate all of them - U-Boot just
uses two of them anyway. This is fine, as they are only ever read by the
CPU (apart from the initial zeroing), so they can't become dirty.
Make this obvious by always invalidating the whole array, regardless of
the entry number we are about to read.
Also blow up the allocation size to cover whole cache lines, to avoid
other heap allocations to sneak in.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Michael Trimarchi <michael@amarulasolutions.com>
Tested-by: Neil Armstrong <narmstrong@baylibre.com>
We use 'priv' for private data but often use 'platdata' for platform data.
We can't really use 'pdata' since that is ambiguous (it could mean private
or platform data).
Rename some of the latter variables to end with 'plat' for consistency.
Signed-off-by: Simon Glass <sjg@chromium.org>
This construct is quite long-winded. In earlier days it made some sense
since auto-allocation was a strange concept. But with driver model now
used pretty universally, we can shorten this to 'auto'. This reduces
verbosity and makes it easier to read.
Coincidentally it also ensures that every declaration is on one line,
thus making dtoc's job easier.
Signed-off-by: Simon Glass <sjg@chromium.org>
This patch try to avoids eviction of dirty lines during DMA
transfer. The code right now execute the following step:
- allocate the buffer
- start a dma operation using the non-coherent dma buffer
- invalidate cache lines associated with the buffer
- read the buffer
This can lead to reading back not valid information, because the cache
controller could evict dirty cache lines belonging to the buffer *after*
the DMA operation has started to fill the DRAM.
In order to avoid this, a new invalidation is required *before* starting
the DMA operation. The patch just adds an invalidation before submitting
the DMA command.
Example below shows the nvme disk scan result without the following
patch
=> nvme scan
nvme_get_info_from_identify: nn = 544502629, vwc = 100,
sn = dev_0T, mn = `�\�, fr = t_part, mdts = 105
So, invalidating the cache before submitting the admin command,
fix the cpu read.
Cc: André Przywara <andre.przywara@arm.com>
Reported-by: Suniel Mahesh <sunil@amarulasolutions.com>
Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com>
Signed-off-by: Jagan Teki <jagan@amarulasolutions.com>
Tested-by: Suniel Mahesh <sunil@amarulasolutions.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
At present dm/device.h includes the linux-compatible features. This
requires including linux/compat.h which in turn includes a lot of headers.
One of these is malloc.h which we thus end up including in every file in
U-Boot. Apart from the inefficiency of this, it is problematic for sandbox
which needs to use the system malloc() in some files.
Move the compatibility features into a separate header file.
Signed-off-by: Simon Glass <sjg@chromium.org>
These functions are CPU-related and do not use driver model. Move them to
cpu_func.h
Signed-off-by: Simon Glass <sjg@chromium.org>
Reviewed-by: Daniel Schwierzeck <daniel.schwierzeck@gmail.com>
Reviewed-by: Tom Rini <trini@konsulko.com>
This function belongs in time.h so move it over and add a comment.
Signed-off-by: Simon Glass <sjg@chromium.org>
Reviewed-by: Tom Rini <trini@konsulko.com>
Change the stack-allocated buffer for the identification command
to explicitly allocate page-aligned buffers. Even though the spec
seems to allow having admin queue commands on non page-aligned
buffers, it seems to not be possible on my i.MX8MQ board with a
a Silicon Power P34A80. Since all of the NVMe drivers I have seen
always do admin commands on a page-aligned buffer, which does work
on my system, it makes sense for us to do that as well.
Signed-off-by: Patrick Wildt <patrick@blueri.se>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
It's possible that the data cache for the buffer still holds data
to be flushed to memory, since the buffer was probably used as stack
before. Thus we need to make sure to flush it also on reads, since
it's possible that the cache is automatically flused to memory after
the NVMe DMA transfer happened, thus overwriting the NVMe transfer's
data. Also add a missing dcache flush for the prp list.
Signed-off-by: Patrick Wildt <patrick@blueri.se>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
This adds a function which can be used by e.g. EFI to retrieve
the namespace identifier and EUI64. For that it adds the EUI64
to its driver internal namespace structure and copies the EUI64
during namespace identification.
Signed-off-by: Patrick Wildt <patrick@blueri.se>
Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
When large writes take place I saw a Samsung EVO 970+ return a status
value of 0x13, PRP Offset Invalid. I tracked this down to the
improper handling of PRP entries. The blocks the PRP entries are
placed in cannot cross a page boundary and thus should be allocated
on page boundaries. This is how the Linux kernel driver works.
With this patch, the PRP pool is allocated on a page boundary and
other than the very first allocation, the pool size is a multiple of
the page size. Each page can hold (4096 / 8) - 1 entries since the
last entry must point to the next page in the pool.
Signed-off-by: Aaron Williams <awilliams@marvell.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
When dma_addr_t is u32 in 64-bit, there are some warnings when
building NVME driver. Fix it by doing an additional (long) cast.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
part_init() is currently called in every DM BLK driver, either
in its bind() or probe() method. However we can use the BLK
uclass driver's post_probe() method to do it automatically.
Update all DM BLK drivers to adopt this change.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Simon Glass <sjg@chromium.org>
When U-Boot started using SPDX tags we were among the early adopters and
there weren't a lot of other examples to borrow from. So we picked the
area of the file that usually had a full license text and replaced it
with an appropriate SPDX-License-Identifier: entry. Since then, the
Linux Kernel has adopted SPDX tags and they place it as the very first
line in a file (except where shebangs are used, then it's second line)
and with slightly different comment styles than us.
In part due to community overlap, in part due to better tag visibility
and in part for other minor reasons, switch over to that style.
This commit changes all instances where we have a single declared
license in the tag as both the before and after are identical in tag
contents. There's also a few places where I found we did not have a tag
and have introduced one.
Signed-off-by: Tom Rini <trini@konsulko.com>
"lbas" with type "u16" (16 bits, unsigned) is promoted in
"lbas << ns->lba_shift" to type "int" (32 bits, signed), then
sign-extended to type "unsigned long long" (64 bits, unsigned).
If "lbas << ns->lba_shift" is greater than 0x7FFFFFFF, the upper
bits of the result will all be 1.
Fix it by casting "lbas" to "u32".
Reported-by: Coverity (CID: 166730)
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Tom Rini <trini@konsulko.com>
memset() was given a sizeof(NVME_Q_NUM * sizeof(struct nvme_queue *)
to clear, which is wrong.
Reported-by: Coverity (CID: 166729)
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Tom Rini <trini@konsulko.com>
At present the NVMe uclass driver uses a global variable nvme_info
to store global information like namespace id, and NVMe controller
driver's priv struct has a blk_dev_start that is used to calculate
the namespace id based on the global information from nvme_info.
This is not a good design in the DM world and can be replaced with
the following changes:
- Encode the namespace id in the NVMe block device name during
the NVMe uclass post probe
- Extract the namespace id from the device name during the NVMe
block device probe
- Let BLK uclass calculate the devnum for us by passing -1 to
blk_create_devicef() as the devnum
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
So far cache operations are only applied on the submission queue and
completion queue, but they are missing in other places like identify
and block read/write routines.
In order to correctly operate on the caches, the DMA buffer passed
to identify routine must be allocated properly on the stack with the
existing macro ALLOC_CACHE_ALIGN_BUFFER().
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
The NVMe block read and write routines are almost the same except
the command opcode. Let's consolidate them to avoid duplication.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
NVMe driver only uses two queues. The first one is allocated to do
admin stuff, while the second one is for IO stuff. So far the driver
uses magic number (0/1) to access them. Change to use macros.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
So far the driver unconditionally delays 10ms when en/disabling the
controller and still return 0 if 10ms times out. In fact, spec defines
a timeout value in the CAP register that is the worst case time that
host software shall wait for the controller to become ready.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Capabilities register is RO and accessed at various places in the
driver. Let's cache it in the controller driver's priv struct.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
So far this is not causing any issue due to NVMe and x86 are using
the same endianness, but for correctness, it should be fixed.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
ndev->queues is a pointer to pointer, but the allocation wrongly
requests sizeof(struct nvme_queue). Fix it.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
The codes currently try to read PCI vendor id of the NVMe block
device by dm_pci_read_config16() with its parameter set as its
root complex controller (ndev->pdev) instead of itself. This is
seriously wrong. We can read the vendor id by passing the correct
udevice parameter to the dm_pci_read_config16() API, however there
is a shortcut by reading the cached vendor id from the PCI device's
struct pci_child_platdata.
While we are here fixing this bug, apparently the quirk stuff handle
codes in nvme_get_info_from_identify() never takes effect since its
logic has never been true at all. Remove these codes completely.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Maximum Data Transfer Size (MDTS) field indicates the maximum
data transfer size between the host and the controller. The
host should not submit a command that exceeds this transfer
size. The value is in units of the minimum memory page size
and is reported as a power of two (2^n).
The spec also says: a value of 0h indicates no restrictions
on transfer size. On the real NVMe card this is normally not
0 due to hardware restrictions, but with QEMU emulated NVMe
device it reports as 0. In nvme_blk_read/write() below we
have the following algorithm for maximum number of logic
blocks per transfer:
u16 lbas = 1 << (dev->max_transfer_shift - ns->lba_shift);
dev->max_transfer_shift being 0 will for sure cause lbas to
overflow. Let's use 20. With this fix, the NVMe driver works
on QEMU emulated NVMe device.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Tom Rini <trini@konsulko.com>
NVMe should use the nsze value from the queried device. This will
reflect the total number of blocks of the device and fix detecting
my Samsung 960 EVO 256GB.
Original:
Capacity: 40386.6 MB = 39.4 GB (82711872 x 512)
Fixed:
Capacity: 238475.1 MB = 232.8 GB (488397168 x 512)
Signed-off-by: Jon Nettleton <jon@solid-run.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Tested-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Tom Rini <trini@konsulko.com>
This adds support to detect the catchall PCI class for NVMe devices.
It allows the drivers to work with most NVMe devices that don't need
specific detection due to quirks etc.
Tested against a Samsung 960 EVO drive.
Signed-off-by: Jon Nettleton <jon@solid-run.com>
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Tom Rini <trini@konsulko.com>
NVM Express (NVMe) is a register level interface that allows host
software to communicate with a non-volatile memory subsystem. This
interface is optimized for enterprise and client solid state drives,
typically attached to the PCI express interface.
This adds a U-Boot driver support of devices that follow the NVMe
standard [1] and supports basic read/write operations.
Tested with a 400GB Intel SSD 750 series NVMe card with controller
id 8086:0953.
[1] http://www.nvmexpress.org/resources/specifications/
Signed-off-by: Zhikang Zhang <zhikang.zhang@nxp.com>
Signed-off-by: Wenbin Song <wenbin.song@nxp.com>
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Tom Rini <trini@konsulko.com>