Conventionally, solid state storage drive (SSD) architectures and designs have primarily focused on obtaining a high average bandwidth or throughput for input and output (I/O) operations (i.e., reading and writing data). Compared to traditional magnetic storage devices such as hard disk drives (HDDs), SSDs are capable of performing I/O operations that are hundreds, if not thousands, of times greater per second as compared to HDDs. Such conventional SSDs are capable of obtaining such high average bandwidth through parallelism in its architecture.
An SSD typically comprises a number of non-volatile memory dies, such as NAND flash memory, that are arranged in groups coupled to channels controlled by a channel controller. A physical storage block from each of the non-volatile memory dies are commonly selected to create logical blocks, or “superblocks,” for one or more host devices, such as a computer or storage appliance, to write and read data to and from, respectively. Selecting a physical block from each of the non-volatile memory dies to form superblocks allows parallel access to all of the non-volatile memory dies across all channels, achieving maximum bandwidth or throughput. A die may further be organized into multiple “planes” (each die comprising two, four, or more planes), where each plane may process an I/O operation in parallel.
While such an SSD architecture maximizes the bandwidth or throughput of an SSD, this architecture also suffers from a number of issues that impact I/O latency (i.e., the amount of time it takes to complete an I/O operation). Due to physical limitations of the non-volatile memory dies, only a single physical block per plane per non-volatile memory die can perform an I/O operation at a time, which leads to collisions between I/O operations to different physical blocks of the same plane of the same non-volatile memory die where an I/O operation must wait until the previous operation to a different block in the same plane has completed as they belong to different logical blocks that the host may be writing to or reading from at the same time. Relatedly, because there are multiple non-volatile memory dies per channel controller, there may also be collisions between commands for I/O operations to different logical blocks at the channel controller, due to the shared nature of a channel in which only one data transfer may proceed at any time between the controller and any non-volatile memory die, leading to bottlenecks at each channel controller of the SSD.
In addition to I/O operations from hosts, the SSD must perform maintenance operations throughout the lifetime of the SSD, such as garbage collection to consolidate valid data and erase invalid data to create free areas for new data to be written. These maintenance operations take place at typically indeterminate times throughout the lifetime of the SSD as needed and last for an indeterminate period of time, which inevitably lead to collisions with host I/O operations at both the channel controllers and the non-volatile memory dies. These collisions, either due to host I/O operations or SSD maintenance operations causes inconsistent and unpredictable SSD latency performance.
Further, in addition to I/O operations from hosts and maintenance operations, the SSD must also perform other internal administrative or “housekeeping” operations throughout its lifetime. Such housekeeping operations typically involve testing for decaying data bits, “warming up” memory cells that have not been accessed recently, and performing other inspections related to the health of the non-volatile memory dies, including rewriting data to new locations to refresh data at risk of being unrecoverable due to age or increased errors. SSDs commonly have the ability to issue such housekeeping-related read, write, and erase commands to all its non-volatile memory dies in parallel and at a high rate. These housekeeping operations involving internal SSD (non-host) read, write, and erase operations can lead to collisions with host I/O operations at the channel controllers and the non-volatile memory dies, which increases the variability of the latency seen by the host.
What is needed, therefore, is an improved technique for managing internal command queues of SSDs to reduce collisions with host I/O operations to provide consistent I/O operation and performance.