The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for automatically preventing large block writes from starving small block writes in a storage device, such as a solid-state drive or solid-state drive subsystem.
A solid-state drive (SSD) is a data storage device that uses solid-state memory like NAND Flash to store persistent data. Solid State Disks can refer to many different form factors including those that have similar protocol access of a traditional block I/O hard disk drive. The term SSD can also refer to form factors that are not common with hard disk drives (HDDs) like peripheral component interconnect express (PCIE) cards or custom form factors. SSDs are distinguished from traditional HDDs, which are electromechanical devices containing spinning disks and movable read/write heads. SSDs, in contrast, use microchips that retain data in non-volatile memory chips and contain no moving parts. Compared to electromechanical HDDs, SSDs are typically less susceptible to physical shock, are quieter, and have lower access time and latency. SSDs do come in forms that have the same interface as hard disk drives like serial attached small computer systems interface (SAS), serial advanced technology attachment (SATA), and Fibre Channel, thus allowing clients to use the two types interchangeably in most available storage systems today. In some applications, a client may use all SSDs, while in many applications the client might use a mixture of the two types.
SSDs are starting to revolutionize the data center as heretofore unheard of levels of performance are now possible. Servers can bring in more data, and the input/output (IO) bottleneck that caused faster and faster processors to wait more often is much less of a problem. Storage systems are also starting to use SSDs as tiers of storage alongside HDDs. In some cases, pure SSD configurations are starting to be used. Because SSDs hold vital client data, it is important that the drives still have some sort of disaster recovery solution applied to them like flash copy or peer-to-peer remote copy or both.
These operations can result in multiple concurrent streams of commands to the SSD. The user may be issuing a combination of read and write operations. In many online transaction processing (OLTP) environments, the data size of these operations are relatively small. Perhaps 4 K bytes and even smaller for mainframe systems. The snap shot and remote copy operation may result in very large block writes, say 128 K or 256 KB or even larger. This is one reason an SSD may see an intermix of both large and small block writes. Other applications will also result in the same effect but for different reasons.
As users start to adopt NAND Flash SSDs in more applications, some of the complexities of their usage are becoming apparent. While they perform much faster than HDDs, they can not simply be overwritten as an HDD can, therefore, data must be virtualized and a map table created to store physical to logical map information. Update writes cause invalidates to parts of the map table and therefore garbage collection must take place in order to reclaim space that is not being used. This garbage collection process must be performed concurrently with host operations and care must be taken so that it will not cause inconsistent performance.
Additionally, the nature of writes poses issues for SSDs in many other ways. A write operation to NAND Flash must take place at a certain minimum granularity referred to hereafter as a page. A page in current Flash devices is 8 K and seems to be headed to 16 K bytes. Before a write can take place, the erase block must first have been erased. An erase block contains many pages. It can be 512 K up to 2 MB and even larger. Erases take place in the back ground as blocks are reclaimed but they can take many milliseconds to successfully erase.
Although an SSD can perform many thousands of writes it does so by the use of parallelism. Each Flash die is very slow at performing a write and it can take up to 2 ms to write a page. Although modern Flash are designed to have 2 or 4 planes, it still means that a given Flash die can only write up to 4 pages concurrently. This means that some commands have to wait in queues before they can be completed.
These are a few specific issues that cause complexity with keeping Flash performance and latency consistent, and specifically with regards to writes. One can see that the writing of large blocks can keep more Flash die busy and therefore can cause small block writes to endure large waits.