Currently, most hard drives (HDs) and solid state drives (SSDs) are configured connect to a host computer system (or “host” for short) via the Serial Advanced Technology Attachment (SATA) bus. However, due to technological advancements, the access speed of SSDs has increased to a point where the maximum transfer speed of the SATA bus has become a bottleneck. As such, there are now also SSDs that are configured to connect to a host computer system via the Peripheral Component Interconnect Express (PCIe or PCI-E) bus, which offers higher maximum transfer speed and bandwidth scalability compared to the SATA bus.
To more fully take advantage of what the PCIe bus has to offer, the Non-Volatile Memory Express (NVMe) specification has also been developed. The NVMe specification is a logical device interface specification developed for accessing non-volatile storage media attached via the PCIe bus. The NVMe specification offers significant advantages, such as lower latency and improved multi-processor core support, over the Advanced Host Controller Interface (AHCI) specification that was developed for the SATA bus. Hereinafter, devices that adopt and operate according to the NVMe interface specification are referred to as “NVMe devices.”
A way through which NVMe devices provide improved performance over SATA-enabled devices is by utilizing multiple I/O queues. These I/O queues, however, typically reside in a kernel space of the host's memory space, which means that they are accessible only by kernel mode processes. So when a user application process, which has only user mode access and runs in a designated user space of the host's memory space, has to perform an input/output (I/O) operation (e.g., read or write) on the NVMe device, the user application process would have to submit an I/O request to one or more kernel mode processes in the kernel space. That is, the user application process would have to access the I/O queues indirectly using kernel mode access. Going through the kernel processes to access the I/O queues, however, involves passing or processing the I/O request through one or more abstraction layers (e.g., the block I/O layer) and inevitably incurs latency.