NON-VOLATILE MEMORY EXPRESS (NVMe) is a device interface specification that allows non-volatile storage media to use the PCI EXPRESS (PCIe) interface of a host computer system. Non-volatile storage media typically comprises flash memory, and comes in the form of solid-state drives (SSDs). NVMe allows for reduced latency, higher data transfer bandwidth, and reduced power consumption as compared to legacy interfaces such as SATA.
Modernly, there is a trend to disaggregate storage devices from the host system, where the host and storage device reside in different systems and are connected via network interfaces to a communications network topology, such as a switched network fabric comprising network nodes interconnected via one or more network switches. In order to maintain high performance and low latency over a switched network fabric, REMOTE DIRECT MEMORY ACCESS (RDMA) may be used. RDMA allows direct access to the memory of one computer system to another, without involving either's operating system, or CPUs, where the computer systems are equipped with RDMA-enabled Network Interface Cards (RNICs). This reduces the overhead and permits high-throughput, low-latency networking.
FIG. 1 is a block diagram of a host device 102 accessing a storage device 116 over an RDMA network fabric 112, according to the prior art. In operation, to perform a write or a read operation, the host device 102 sends an encapsulated command that can be un-encapsulated by the storage device 116 over the RDMA network fabric 112. After the storage device 116 receives the encapsulated command, the storage device 116 decapsulates the command and parses it to determine if the command is a write command or a read command. If the command is a write command, the storage device 116 retrieves the data to be written from the host device 102 using an RDMA_READ command. After the data is written, the storage device 116 then encapsulates a response and sends it to the host device 102 via RDMA messaging notifying the host device 102 that the data has been written to the storage device 116. Similarly, if the command is a read command, the storage device 116 reads the data and sends it to the host device 102 using an RDMA_WRITE command. Again, after the read data is transmitted to the host device 102, the storage device 116 encapsulates a response and sends it to the host device 102 via RDMA messaging.
As shown and described in FIG. 1, aspects of legacy storage remain in current NVMe over fabric applications. Namely, the storage device 116 is still responsible for parsing in coming commands from and encoding responses to the host device 102, managing the data transfer between the host device 102 and the storage device 116, and the NVMe driver operations. As a result, the computational demand on the storage device 116 is very high. In a typical network storage environment where multiple host devices seek to access the storage device, the storage device becomes a bottleneck as it quickly runs out of processing bandwidth to handle the multiple host devices resulting in slow performance. Moreover, there is no ability to scale out the storage device to allow hundreds or thousands of host devices to access the storage device at the same time.
What is needed, therefore, is an improved desegregated storage environment that reduces load from the storage device over a network fabric and allows for an enhanced topology that can scale out to thousands of host devices to a single storage device.