1. Technical Field
The invention relates to the field of flash cache architecture for storage systems, and more specifically to a method of cache memory management.
2. Description of the Related Art
Cache is one of most widely used performance optimizations techniques in file and storage systems, exploiting temporal and spatial locality of workloads. Cache may exist in two forms: read cache and write cache, and typically relies on a faster storage device (e.g. DRAM) than the backend storage device (e.g. disks or RAID).
A read cache improves performance by transparently storing data such that future requests for that data can be served faster. If requested data is contained in the cache, that is, if a cache hit occurs, this request can be served by simply reading the cache, which is comparably faster than reading from the backend storage device. Otherwise a cache miss occurs and the data has to be fetched from the slower backend storage device.
A write cache improves performance by absorbing repeated writes and coalescing adjacent writes into a single write request to disks. A write cache has to be persistent, e.g., a battery-backed DRAM, for data protection against power failures.
Flash memory is emerging as a potential replacement for DRAM as cache for back-end, disk-based storage systems. Compared with DRAM, flash memory has a higher density, consumes much less power and is much less expensive on a $/GB basis. In terms of latency, flash memory is about 100 times faster than disk access, while 10 times slower than DRAM.
However, the read/write/erase behavior of flash memory is radically different from that of HDD or DRAM owing to its unique erase-before-write and wear-out characteristics.
Flash memory that contains data must be erased before it can store new data, and it can only endure a limited number of erase-program cycles, usually between 100,000 for single-level cells (SLC) and 10,000 for multiple-level cells (MLC). Flash memory is organized in units of pages and blocks. Typically, a flash page is 4 kB in size and a flash block has 64 flash pages (thus 256 kB).
Reads and writes are performed on a page basis, whereas erases operate on a block basis. Reading a page from flash cells to a data buffer inside a flash die takes 25 μs, writing a page to flash cells takes about 200 μs, and erasing a flash block typically takes 2 ms. Although the read and write operations take on the order of 10 μs, the erase operations take milliseconds. Therefore, using flash natively, such as for write-in-place, would not yield a high performance, and flash memory is universally used in the out-of-place write manner, analogous to the log-structured file system, for high-performance applications.
To hide the erase-before-write characteristics of flash memory and the excessive latency of block erases, modern flash SSDs implement a software layer, called flash translation layer. It performs logical-to-physical address translation, i.e., translating a logical block address (LBA) in the upper software layer to a physical address in flash memory.
To perform out-of-place writes, the controller of the SSD maintains a data structure for the pool of free flash blocks in which data can be written. How the data pages of free flash blocks are allocated to service user write requests is dictated by the data placement function. The controller maintains a data structure for the pool of occupied flash blocks in which valid and invalid pages may co-exist in the same flash block. A complete sequence of an out-of-place write is as follows: (i) choose a free flash page, (ii) write new data to it, (iii) invalidate the old page with the previous version of the data, and (iv) update the logical-to-physical address map to reflect the address change.
The out-of-place write necessitates a background routine called garbage collection (GC) to reclaim invalid pages dispersed in flash blocks. GC selects an occupied flash block as its target, copies valid pages out of the block, erases it, and if successful, adds it to the pool of free blocks for subsequent use. In addition, wear leveling can be triggered to balance the wear evenly among blocks, and bad block management keeps track of bad flash blocks to prevent their reuse. The controller may optionally use some DRAM as a write buffer to absorb repeated write requests and possibly to increase the workload sequentially.
It can be seen from the above that one critical issue for flash memory is the written amplification due to random writes, i.e., each single user write can cause more than one actual write, owing to background activities such as garbage collection and wear leveling. Write amplification occurs mainly because SSDs write data in an out-of-place mode, which requires garbage collection similar to a log-structured file system.
Thus, how to use and manage flash memory in a storage system as a cache is a challenging question.
Several straightforward approaches exist to integrate flash as a cache into a storage system. A first one includes an efficient data lookup and a set-associative caching strategy as disk cache to save disk power and improve overall read/write latency.
Another straightforward approach is to use flash memory in server platforms as a disk cache, and the overall performance and reliability can be improved by splitting flash based disk caches into read and write regions.
The impact of flash SSD as cache for server storage has been investigated, assuming least recently-used (LRU) read cache and cyclic-de-staging write cache. Unfortunately, this investigation focused on the performance and cost analysis aspects of flash cache, and it did not cover the important issue of how to manage flash memory.