Flash memory based solid-state drives (SSD) have been used widely in both consumer computers and enterprise servers. There are two main types of flash memory, which are named after the NAND and NOR logic gates. NAND type flash memory may be written and read in blocks, each of which comprises a number of pages.
Since the NAND flash storage cells in SSDs have very unique properties, SSD's normal usages are very inefficient. For example, although it can be randomly read or programmed a byte or a word at a time, NAND flash memory can only be erased a block at a time. To rewrite a single NAND Flash page, the whole erase block (which contains a lot of flash pages) has to be erased first.
Since NAND flash based storage devices (e.g., SSDs) do not allow in-place updating, a garbage collection operation is performed when the available free block count reaches a certain threshold in order to prepare space for subsequent writes. The garbage collection includes reading valid data from one erase block and writing the valid data to another block, while invalid data is not transferred to a new block. It takes a relatively significant amount of time to erase a NAND erase block, and each erase block has a limited number of erase cycles (from about 3K times to 10K times). Thus, garbage collection overhead is one of the biggest speed limiters in the technology class, incurring higher data I/O latency and lower I/O performance.
Therefore, operating systems (OS) and applications, which don't treat hot/cold data differently, and store them together, will see performance degradation over time (compared to OS's and applications that do treat hot and cold data differently), as well as a shorter SSD lifetime as more erase cycles are needed, causing the NAND cells to wear out faster.
SSD vendors and storage technical committees have come up with a new SSD and standard, called “multi-stream SSD,” to overcome this issue by providing OSs and applications with interfaces that separately store data with different lifespans called “streams.” Streams are host hints that indicate when data writes are associated with one another or have a similar lifetime. That is, a group of individual data writes are a collective stream and each stream is given a stream ID by the OS or an application. For example, “hot” data can be assigned a unique stream ID and the data for that stream ID would be written to the same erase block in the SSD. Because the data within an erase block has a similar lifetime or is associated with one another, there is a greater chance that an entire erase block is freed when data is deleted by a host system, thereby significantly reducing garbage collection overhead because an entire target block would either be valid (and hence no need to erase), or invalid (we can erase, but no need to write). Accordingly, device endurance, and performance should increase.
However, to utilize this new interface, many changes within the applications (including source code) and the OS are required. As a typical computer can have tens or hundreds of software applications installed and running, it's very difficult for all applications, especially for legacy and closed-source applications, to adapt to those changes, in order to use SSDs more efficiently. In addition, multi-stream SSD has limited applicability in that multi-stream SSD is only compatible for use by operating systems and applications.
What is needed is an improved data property based data placement in a storage device, and more particularly, to an autonomous process that enables computer devices to utilize data property based data placement (e.g., multi-stream) solid-state drives.