“Hot” data is data that is likely to change shortly after it has been written to memory, while “cold” data is data that is likely to remain unchanged for a long period of time. Mixing hot and cold data together in the same block of flash memory may cause a garbage collection process to be inefficient. During garbage collection, blocks of flash memory are erased, and the valid data portion of the block is copied to a new block. In a block containing hot data only, a large portion of the block will become obsolete prior to a garbage collection operation requesting an erase of the block. Therefore, only a small amount of data will need to be copied to a new block. A block containing only cold data will most probably stay stable, and garbage collection operations will be applied to such a block at a lower rate than the rate of garbage collection on a hot block. In a block containing both hot and cold data, the hot data will become obsolete while the cold data will still be valid. It will, therefore, suffer from frequent garbage collection operations (like a hot block) but with a large amount of data to be copied to a new block (like a cold block). It is, therefore, advantageous to identify, in advance, the hot and cold data and write them into different physical blocks.
According to some previous approaches to this problem, the host keeps a record of the write commands. Typically, when a host issues a write command, there is a logical address (LBA) associated with the command. The record of write commands would typically include an entry for each LBA, where the record entry will contain information regarding the LBA. The information may include parameters such as a time stamp of the write command, a hit count (counting the number of hits in a given window), and a temperature tag, for example. This information can be used to identify hot/cold LBAs according to a given definition of hot and cold. However, the resolution of LBAs may be very fine (e.g., an LBA may be associated with a data of a 4 Kb size), and the amount of memory required for implementing this approach may be very large and unacceptable.
Other previous approaches categorize data according to its size (i.e., data that is written in small data chunks vs. data that is written in large data chunks). The assumption of this approach is that data that is stable may be already given by the host in large data chunks, while data which is subject to frequent change cannot be aggregated into large data chunks. But, when the decision is made only upon data access size, it is probable that small files often encountered during web browsing could be identified as unstable even though they are not updated frequently.
Another identification method differentiates between cold and hot data by compressing the write data and checking the compression ratio. Considering that multimedia files are already compressed, it would be possible to classify them as cold data after evaluating the compression ratio of write data. In such a method, hot data (e.g. file system metadata) may be effectively compressed, and the cold data (e.g. multimedia data) may not because it is already encoded.
Another previous approach is to combine the compression ratio approach with the data chunk size approach. Large data chunks are considered to be cold data, while for small data chunks, the compression ratio test is used. Data chunks that are already compressed are considered cold, while data chunks with a high compression ratio are considered hot. To avoid high overhead, to measure the compression ratio, the proposed technique compresses only a fraction of data without losing the determination accuracy to a large extent.
In yet another approach, a hot data identification scheme has the following requirements: effective capture of recency as well as frequency, small memory consumption, and low computational overhead. In this approach, a hot data identification scheme based on multiple bloom filters is used. Operation of this approach proceeds by adopting a set of V independent Bloom Filters (BFs) and K independent hash functions. Whenever a write request is issued to the Flash Translation Layer (FTL), the corresponding LBA is hashed by the K hash functions. Then, K hash values set the corresponding K bits in the first BF to 1. When the next write request comes in, the scheme chooses the next BF in a round robin fashion to record its hash values. In addition, it periodically selects one BF in a round-robin manner and erases all information in that BF to reflect a decay effect.