A typical data storage system includes a cache (i.e., a block of memory) that stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere. If the requested data is contained in the cache (herein referred to as a cache hit), this request can be served by simply reading the cache, which is comparatively faster. On the other hand, if the requested data is not contained in the cache (herein referred to as a cache miss), the data has to be recomputed or fetched from its original storage location, which is comparatively slower. Hence, the greater the number of requests that can be served from the cache, the faster the overall system performance becomes.
To be cost efficient and to enable an efficient use of data, caches are relatively small. Nevertheless, due to access patterns in typical computer applications, caches have proven themselves in data storage systems to have temporal locality, which refers to the reuse of specific data in the cache within a relatively small time duration. Conventional storage systems fail to leverage off this phenomenon, resulting in inefficient cache utilization.
FIG. 1 illustrates a conventional system wherein client 101 (e.g., a laptop) is connected to storage system 102, which in turn is connected to storage system 103. Storage system 102 includes a cache which is made up of cache slots 110-111. Storage system 102 also includes storage device 120. In the illustrated example, at operation 150, client 101 stores a first data to storage system 102. The first data is initially buffered in cache slot 110. At operation 151, the first data is fetched from cache slot 110 and written to storage device 120.
Subsequently, storage system 102 performs a backup of a second data from storage device 120 to storage system 103. As part of this backup, at operation 152, storage system 102 writes the second data to cache slot 111, and then at operation 153, the second data is read from cache slot 111 and written to its backup storage system 103. A conventional storage system evicts cache slots based on an aging algorithm wherein the “oldest” cache slot is freed for reuse. In this example, cache slot 110 is older than cache slot 111 because cache slot 110 was populated with data earlier in time. Thus, cache slot 110 is freed for reuse before cache slot 111, even though cache slot 110 contains “live” data (which is likely to be re-accessed soon) while cache slot 111 contains backup data (which is less likely to be re-accessed any time soon).