Over the past decade, the use of flash memory has evolved from specialized applications in hand-held devices to the primary system storage in general-purpose computers. Recently, there has been significant interest in using flash-memory storage to run primary file systems for personal computers as well as file servers in data centers.
Flash memory includes of an array of individual cells, each of which is constructed from a single floating-gate transistor. Single Level Cell (SLC) flash stores a single bit per cell and is typically more robust; Multi-Level Cell (MLC) flash offers higher density and therefore lower cost per bit.
Both SLC and MLC flash support three operations: read, write (or program), and erase. In order to change the value stored in a flash cell it is necessary to perform an erase before writing new data. Read and write operations typically take tens of microseconds. Although the erase operation may take more than a millisecond, compared with magnetic disk drives, flash can substantially improve reliability and random I/O performance.
With these desirable properties of flash memories, the emergence of flash-based solid-state drives (SSDs) has the potential to mitigate I/O penalties in current systems. For example, flash-based Solid State Devices (SSDs) provide thousands of low-latency I/O Operations Per Second (TOPS) and are not affected by seeks. Additionally, recent SSDs provide peak throughput that is significantly superior to magnetic hard disk drives (HDD).
To support traditional layers of abstractions for backward compatibility, some development work on SSDs has focused on building firmware and software. For example, recently proposed techniques such as the Flash Translation Layer (FTL) are typically implemented in a solid-state disk controller with a disk drive abstraction. System software then uses a traditional block storage interface to support file systems and database systems. This compatibility therefore improves the likelihood of using SSDs in computer systems.
However, although SSD capacity continues to increase, there is no indication that the current high cost per GB for SSDs will soon approach the much lower cost of magnetic disks. As a result, I/O hierarchies often include both HDDs and SSDs, and it is important to examine techniques that can increase SSD cost-efficiency. To optimize cost-efficiency, a heterogeneous system should determine the data popularity and then migrate the hottest data from the slower HDDs into the SSDs. However, migration is not cost-free. This is because additional internal IO requests for migration are queued in the disk drive, resulting more front-end latency.
Therefore, when integrating both SSD and HDDs in the heterogeneous storage system, a key systems design question is how to build the framework of modules that are integrated together, such as how to identify the hottest blocks in the HDD array and how to migrate them to the SSD to boost IO performance. Also, the system stack for a heterogeneous storage system has the traditional HDD file system with a cache stored in the SSDs. As a result, creating an efficient system stack with good performance is also a challenge.
What is desired is a heterogeneous storage system with a SSD cache for HDDs. A heterogeneous storage system with an intelligent caching algorithm is desired. A heterogeneous storage system that chooses the hottest data for caching is desirable to improve performance. Selecting blocks for caching based on two factors rather than just a single factor is desired to improve cache performance.