Advances in computing technology has resulted in the ability to store ever growing amounts of data. However, the performance of data storage systems is often limited by hard disk drive (HDD) latency, which has been relatively constant for years. To improve performance, data storage systems use caching layers. Typically, each caching layer performs better than the lower layer.
In systems that deal with large amounts of data, flash memory can be used as a caching layer and can be much larger than DRAM (dynamic random access memory). In fact, caches configured from flash memory (flash cache) may be very large (e.g., hundreds of gigabytes to hundreds of terabytes in size). Flash memory has both higher IOPS (input output operations per second) and lower latency compared to HDDs.
The performance of a storage system can be improved by placing the most valuable data or metadata into the flash cache for faster access. Unlike DRAM, flash is persistent across system restarts. Consequently, content stored in the flash cache is not lost when a system restarts and the contents can be advantageously used. This is referred to as a warm cache and is distinct from starting with a cold cache that needs to be repopulated with data.
However, an index is needed to access the contents of the flash cache. The index is usually stored in memory such as DRAM and maps an identifier (e.g., a fingerprint, hash, key, or the like) to a location in the flash cache. The data stored in the flash cache may be data such as file blocks, content-defined chunks, or meta-data such as directory records, file indirect blocks, or the like. Because the index in DRAM is lost across restarts, it is necessary to rebuild the index before the content of the flash cache can be used.
The index could be stored in the flash cache instead of memory. When the index is stored in the flash cache, it may not be necessary to rebuild the index or load the index into memory. A drawback of this approach is that the index has to be kept up-to-date in the flash cache. This has the effect of causing high churn in the flash cache and can have an impact on the performance of the flash cache. Flash has a limited endurance and only supports a limited number of writes before it becomes read-only. As one example, consider a flash device of 100 GB that only supports one full overwrite per day for five years. That means it supports 100 GB times 356 days times 1 write per day times five years, which approximately equals 178 TB of writes before it becomes read-only. Frequent index updates can use up the writes supported by the flash device. Additionally updates to the index are usually very small, such as only a few bytes, but flash updates are at the unit of a page, usually 4 KB, requiring a page to be read, modified and written to a new location for each small update.
In another example, the flash cache can be completely scanned and the index can be rebuilt in memory from the scan. Reading the entire cache, however, requires a lot of time (depending on the size of the cache) and consumes I/O that could be used for other purposes. This is expensive and can negatively impact the performance of the flash cache. Systems and methods are needed for building or for rebuilding an index for a flash cache.