Computer memory systems generally comprise two memory levels: main (or cache) and auxiliary. Cache memory is faster than auxiliary memory, but is also significantly more expensive. Consequently, the size of the cache memory is usually only a fraction of the size of the auxiliary memory.
Caching is one of the most fundamental metaphors in modern computing. It is widely used in storage systems, databases, web servers, middleware, processors, file systems, disk drives, and operating systems. Memory caching is also used in varied and numerous other applications such as data compression and list updating. As a result a substantial progress in caching algorithms could affect a significant portion of the modern computation stack.
Both cache and auxiliary memories are managed in units of uniformly sized items known as pages. Requests for pages are first directed to the cache. A request for a page is directed to the auxiliary memory only if the page is not found in the cache. In this case, a copy is “paged in” to the cache from the auxiliary memory. This is called “demand paging” and it precludes “pre-fetching” pages from the auxiliary memory to the cache. If the cache is full, one of the existing pages must be paged out before a new page can be brought in.
A replacement policy determines which page is “paged out.” A commonly used criterion for evaluating a replacement policy is its hit ratio, the frequency at which a page is found in the cache as opposed to finding the page in the auxiliary memory. The miss rate is the fraction of pages paged into the cache from the auxiliary memory. The replacement policy goal is to maximize the hit ratio measured over a very long trace while minimizing the memory overhead involved in implementing the policy.
Most current replacement policies remove pages from the cache based on “recency” that is removing pages that have least recently been requested, “frequency” that is removing pages that are not often requested, or a combination of recency and frequency. Certain replacement policies also have parameters that must be carefully chosen or “tuned” to achieve optimum performance.
The replacement policy that provides an upper bound on the achievable hit ratio by any online policy is Belady's MIN or OPT (MIN). However, this approach uses a prior knowledge of the entire page reference stream and is not realizable in practice when the page reference stream is not known ahead of time. MIN replaces the page that has the greatest forward distance. Given MIN as a reference, a replacement policy that automatically adjusts to an observed workload is much preferable.
The most commonly used replacement policy is based on the concept of replace the least recently used (LRU) page. The LRU policy focuses solely on recency, always replacing the least recently used page. As one of the original replacement policies, approximations and improvements to LRU abound. If the workload or the request stream is drawn from a LRU Stack Depth Distribution (SDD), then LRU is the optimal policy.
LRU has several advantages: it is relatively simple to implement and responds well to changes in the underlying Stack Depth Distribution (SDD) model. However, while the SDD model captures recency, it does not capture frequency. Each page is equally likely to be referenced and stored in cache. Consequently, the LRU model is useful for treating the clustering effect of locality but not for treating non-uniform page referencing. In addition, the LRU model is vulnerable to one-time-only sequential read requests, or scans, that replace higher-frequency pages with pages that would not be requested again, reducing the hit ratio. In other terms, the LRU model is not “scan resistant.”
The Independent Reference Model (IRM) provides a workload characterization that captures the notion of frequency. Specifically, IRM assumes that each page reference is drawn in an independent fashion from a fixed distribution over the set of all pages in the auxiliary memory. Under the IRM model, the least frequently used (LFU) policy that replaces the least frequently used page is optimal.
While the LFU policy is scan-resistant, it presents several drawbacks. The LFU policy requires logarithmic implementation complexity in cache size and pays almost no attention to recent history. In addition, the LFU policy does not adapt well to changing access patterns since it accumulates state pages with high frequency counts that may no longer be useful.
A relatively recent algorithm, LRU-2, approximates the LFU policy while eliminating its lack of adaptivity to the evolving distribution of page reference frequencies. The LRU-2 algorithm remembers, for each page, the last two times that page was requested and discards the page with the least recent penultimate reference. Under the Independent Reference Model (IRM) assumption, the LRU-2 algorithm has the largest expected hit ratio of any online algorithm that knows the two most recent references to each page.
The LRU-2 algorithm works well on several traces. Nonetheless, LRU-2 still has two practical limitations:                1. The LRU-2 algorithm maintains a priority queue, requiring logarithmic implementation complexity.        2. The LRU-2 algorithm contains one crucial tunable parameter, namely, Correlated Information Period (CIP). CIP roughly captures the amount of time a page seen only once recently should be kept in the cache.        
In practice, logarithmic implementation complexity engenders a severe memory overhead. Another algorithm, 2Q, reduces the implementation complexity to constant per request rather than logarithmic by using a simple LRU list instead of the priority queue used in LRU-2 algorithm. Otherwise, the 2Q algorithm is similar to the LRU-2 algorithm.
The choice of the parameter Correlated Information Period (CIP) crucially affects performance of the LRU-2 algorithm. No single fixed a priori choice works uniformly well across various cache sizes. Consequently, a judicious selection of this parameter is crucial to achieving good performance.
Furthermore, no single a priori choice works uniformly well across various workloads and cache sizes. For example, a very small value for the CIP parameter works well for stable workloads drawn according to the Independent Reference Model (IRM), while a larger value works well for workloads drawn according to the Stack Depth Distribution (SDD), but no value works well for both. This underscores the need for online, on-the-fly adaptation.
However, the second limitation of the LRU-2 algorithm persists even in the 2Q algorithm. The algorithm 2Q introduces two parameters, Kin and Kout. The parameter Kin is essentially the same as the parameter CIP in the LRU-2 algorithm. Both Kin and Kout are parameters that need to be carefully tuned and both are sensitive to workload conditions and types.
Another recent algorithm similar to the 2Q algorithm is Low Inter-reference Recency Set (LIRS). The LIRS algorithm maintains a variable size LRU stack whose LRU page is the Llirs-th page seen at least twice recently, where Llirs is a parameter. From all the pages in the stack, the LIRS algorithm keeps in the cache all the Llirs pages seen at least twice recently as well as the Llirs pages seen only once recently.
The parameter Llirs is similar to the CIP of the LRU-2 algorithm or Kin of 2Q. Just as the CIP affects the LRU-2 algorithm and Kin affects the 2Q algorithm, the parameter Llirs crucially affects the LIRS algorithm. A further limitation of LIRS is that it requires a certain “stack pruning” operation that, in the worst case, may have to touch a very large number of pages in the cache. In addition, the LIRS algorithm stack may grow arbitrarily large, requiring a priori limitation. However, with a stack size of twice the cache size, LIRS becomes virtually identical to 2Q with Kin=1% and Kout=99%.
Over the past few years, interest has focused on combining recency and frequency in various ways, attempting to bridge the gap between LRU and LFU. Two replacement policy algorithms exemplary of this approach are frequency-based replacement, FBR, and least recently/frequently used, LRFU.
The frequency-based replacement algorithm, FBR, maintains a least recently used (LRU) list, but divides it into three sections: new, middle, and old. For every page in cache, the FBR algorithm also maintains a counter. On a cache hit, the FBR algorithm moves the hit page to the most recently used (MRU) position in the new section. If the hit page was in the middle or the old section, then its reference count is incremented. If the hit page was in the new section then the reference count is not incremented; this key concept is “factoring out locality”. On a cache miss, the FBR algorithm replaces the page in the old section with the smallest reference count.
One limitation of the FBR algorithm is that the algorithm must periodically resize (re-scale) all the reference counts to prevent cache pollution due to stale pages with high reference count but no recent usage. The FBR algorithm also has several tunable parameters: the size of all three sections, and two other parameters Cmax and Amax that control periodic resizing. Much like the LRU-2 and 2Q algorithms, different values of these tunable parameters may be suitable for different workloads or for different cache sizes. The performance of the FBR algorithm is similar to that of the LRU-2 and 2Q algorithms.
Another replacement policy that combines the concepts of recency, LRU, and frequency, LFU, is the Least Recently/Frequently Used (LRFU) algorithm. the LRFU algorithm initially assigns a value C(x)=0 to every page x, and, at every time t, updates as:C(x)=1+2−λC(x) if x is referenced at time t;C(x)=2−λC(x)otherwise,where λ is a tunable parameter.
This update rule is a form of exponential smoothing that is widely used in statistics. The LRFU policy is to replace the page with the smallest C(x) value. Intuitively, as λ approaches 0, the C value is simply the number of occurrences of page x and LRFU collapses to LFU. As λ approaches 1, the C value emphasizes recency and the LRFU algorithm collapses to LRU. The performance of the algorithm depends crucially on the choice of λ.
A later adaptive version, the Adaptive LRFU (ALRFU) algorithm, dynamically adjusts the parameter λ. Still, the LRFU the LRFU algorithm has two fundamental limitations that hinder its use in practice:                1. LRFU and ALRFU both require an additional tunable parameter for controlling correlated references. The choice of this parameter affects performance of the replacement policy.        
2. The implementation complexity of LRFU fluctuates between constant and logarithmic in cache size per request.
However, the practical complexity of the LRFU algorithm is significantly higher than that of even the LRU-2 algorithm. For small values of λ, the LRFU algorithm can be as much as 50 times slower than LRU. Such overhead can potentially wipe out the entire benefit of a higher hit ratio.
Another replacement policy behaves as an expert master policy that simulates a number of caching policies. At any given time, the master policy adaptively and dynamically chooses one of the competing policies as the “winner” and switches to the winner. Rather than develop a new caching policy, the master policy selects the best policy amongst various competing policies. From a practical standpoint, a limitation of the master policy is that it must simulate all competing policies, consequently requiring high space and time overhead.
What is therefore needed is a replacement policy with a high hit ratio and low implementation complexity. Real-life workloads possess a great deal of richness and variation and do not admit a one-size-fits-all characterization. They may contain long sequential I/Os or moving hot spots. The frequency and scale of temporal locality may also change with time. They may fluctuate between stable repeating access patterns and access patterns with transient clustered references. No static, a priori fixed replacement policy will work well over such access patterns. Thus, the need for a cache replacement policy that adapts in an online, on-the-fly fashion to such dynamically evolving workloads while performing with a high hit ratio and low overhead has heretofore remained unsatisfied.