The Non-Volatile Memory express (NVMe) Specification is a specification for accessing solid-state devices (SSDs) and other target devices attached through a Peripheral Component Interconnect Express (PCIe) bus. The Non-Volatile Memory express (NVMe) Specification defines a command interface based on a single set of administrative command and completion queues and many sets of operational Input/Output (I/O) command and completion queues. Administrative queues are used for tasks such as queue creation and deletion, device status interrogation and feature configuration, while I/O queues are used for all storage-related transfers, such as block reads and writes. However, the NVMe specification relies on the host resources for command and control to a degree which can present a bottleneck or chokepoint in system performance.
According to the NVMe specification, only a system's host CPU is capable of sending storage commands to an NVMe Controller. Additionally PCI-Express system architecture faces two typical performance constraints. First, typical PCI-Express fabrics with high device fan-out (such as an enterprise storage backplane) have lower total upstream bandwidth (from a PCI-Express Switch upstream to the host) than downstream bandwidth (from the same PCI-Express Switch downstream to all connected storage controllers). This represents bandwidth overprovisioning downstream of the switch, which cannot be fully utilized when the only permitted traffic flows between host and endpoint NVMe Controller. Second, in a system that only permits the host to generate storage traffic to all controllers, the host's resources (especially computation power/CPU and storage/Dynamic Random-Access Memory (DRAM)) are a bottleneck to overall system performance. The overall latency and throughput of the system are bound by the capabilities of the host. The latency problem is especially detrimental for applications like a High Performance Compute platform, where a computation device such as a graphics processing unit (GPU) desires access to a large quantity of data on a storage medium, but cannot access it without the host acting as an intermediary to initiate the storage transfers from the drive to host DRAM and then further memory transfers from host DRAM down to the GPU.
Earlier attempts at resolving such issues include vendor-unique and proprietary solutions that do not resolve the problem of accessing an off-the-shelf NVM Controller. However, this does not enable devices not compatible with such vendor-unique or proprietary solutions to generate such traffic, and further is not compatible with the NVM Express protocol, since NVM Express only allows the system host to generate traffic.
In view of the foregoing, it may be understood that there may be significant problems and shortcomings associated with current technologies for peer-to-peer PCIe storage transfers.