1. Technical Field
The present disclosure relates to a storage system. More particularly, the present disclosure relates to a stream identifier (ID) based storage system for managing an array of Solid State Disks (SSDs).
2. Description of the Related Art
SSDs are semiconductor devices and do not have any mechanical moving parts (unlike magnetic disks). This eliminates disk or SSD head seek latencies and increases performance for various input-output (TO) workloads. The SSDs are also more resilient to mechanical disturbances compared to the magnetic disks. Current SSDs support three basic IO operations, namely, write, read and erase respectively. The basic unit of data storage in the SSDs is a page (a group of flash memory cells, typically in 4 kilobyte (KB) capacity). Pages are further grouped into blocks. Granularity for reads and writes is at page-level, whereas the granularity of the erase operation is at block-level in the SSDs.
The erase operation in the SSDs is performed using a mechanism called garbage collection (GC). The GC is a term defining the process of reclaiming invalidated pages and creating usable free space on the SSDs. Whenever the GC process of a particular block in the SSD is ongoing, all the incoming requests corresponding to the particular block are delayed until the ongoing GC is completed. Thus, the incoming requests are stalled and placed in a queue and scheduled for service following the completion of the GC process. This stalling of the incoming requests (TO request) can degrade the performance of the storage system (i.e., SSDs) when incoming requests are bursty. The simultaneous (i.e., same time) operations of the GC and the IO operation at a same volume of the SSDs impacts the IO performance.
Normally, data analysis (such as Big-data) and on-line transaction processing (OLTP) are not serviced from the same storage server because, for example, they are different kinds of applications that require different capabilities. The OLTP applications use a server with faster IO operations per second (IOPS) than do data analysis applications. In the future, the growing demand for faster data analysis will require solutions such as hosting big-data/data analysis applications on high speed flash based storage servers just like the OLTP. A problem for currently doing this is due to the GC.
The GC is an unavoidable process in the SSDs. Currently, neither flash array (e.g., array of SSDs) instructions or the (non-volatile memory express) NVMe/Serial Attached small computer system interface (SAS) driver have any control over the GC inside the SSDs. The GC within the SSDs affects the IO operations or data computations (in case of smart SSD) to the flash array as the GC and IO operation may be operating or waiting on the same block.
The current mechanisms to support both high performance block storage and data analysis requires increased number of cores inside each SSD and increased processing capacity of each of the cores within the SSD so that storage system operations and data analysis can run within a given time. This may also increase Random Access Memory (RAM) requirements within the SSDs. All the above techniques are costly as they increase cost of each of the SSDs, and consequently cost of flash array systems or arrays of SSDs will increase. Sometimes there needs to be special hardware support for compression to meet the IO performance requirements.
The above information is presented as background information only to help the reader understand the present disclosure. Applicants have made no determination and make no assertion as to whether any of the above might be applicable as Prior Art with regard to the present application.