Hierarchically arranged memory has been a common feature in computing for some time. Fundamentally, faster memory is more expensive per byte. Despite rapid advances in storage performance, it is often economically unsound to utilize only the lowest latency storage medium. Instead, in order to deliver acceptable performance within a fixed budget, storage devices of different sizes and speeds may be arranged so that memory transactions read or write to the fastest devices whenever possible.
In a typical example, a hierarchical memory structure includes a main memory and one or more caches. The main memory is a large pool of storage, and, for reasons including cost, is often made up of relatively slow storage devices. The main memory defines the address space and thereby defines the limits of the available storage. However, portions of the address space may be mapped to a cache, a smaller memory pool typically utilizing a faster storage medium, so that transactions directed to mapped addresses can be read from and/or written to the faster storage medium. In multiple-tiered configurations, portions of the cache may be mapped to another cache made up of an even-faster storage medium. In many examples, memory structures include multiple caches, each utilizing progressively faster storage media.
A number of techniques exist for determining which data to load in a particular cache. By effectively predicting data that will be the target of subsequent transactions, more transactions can be performed by the cache even when the cache is significantly smaller than the main memory. These techniques are grounded in a number of principles, such as the principles of locality. The principle of temporal locality suggests that data that has been accessed recently is likely to be accessed again. Accordingly, frequently accessed data is often cached. The principle of spatial locality suggests that data accesses tend to cluster around certain addresses. Accordingly, a range of addresses is often cached based on an access to an address within the range. However, these principles are merely guidelines. Because every application and computing task has a unique data access pattern, no particular caching algorithm is optimal for all applications. Instead, a balance may be struck based on the anticipated use or uses of a given computing system. Unfortunately, this balance may break down when the working data set of an application grows beyond a certain point. The overhead involved caching and writing back a large data set may diminish the performance gains expected from the cache.
For example, storage systems, computing systems that process data transactions on behalf of other computing systems, are generally very cache-sensitive and often manipulate large data sets. This can be caused by the large numbers of transactions typically received by storage systems and the widely varying workloads depending on host activity. These effects and others make it extremely difficult to tune a single cache algorithm for all supported host applications. Accordingly, an efficient system and method for caching that is responsive to an application's working set has the potential to dramatically improve cache hit rate and system performance.