A CPU cache is a cache used by the central processing unit (CPU) of a computer to reduce the average time to access data from the main memory. The cache is a smaller, faster memory which stores copies of the data from frequently used main memory locations. Most CPUs have different independent caches, including instruction and data caches, where the data cache is usually organized as a hierarchy of more cache levels (L1, L2, etc.).
When the processor needs to read from or write to a location in main memory, it first checks whether a copy of that data is in the cache. If so, the processor immediately reads from or writes to the cache, which is much faster than reading from or writing to main memory.
Data is transferred between memory and cache in blocks of fixed size, called cache lines. When a cache line is copied from memory into the cache, a cache entry is created. The cache entry will include the copied data as well as the requested memory location (also referred to as a “tag”).
As a result, when the processor needs to read from or write to a location in main memory, it first checks for a corresponding entry in the cache. The cache checks for the contents of the requested memory location in any cache lines that might contain that address. If the processor finds that the memory location is in the cache, a cache hit has occurred. However, if the processor does not find the memory location in the cache, a cache miss has occurred. In the case of a cache hit, the processor immediately reads or writes the data in the cache line. For a cache miss, the cache allocates a new entry and copies in data from main memory. Then the request is fulfilled from the contents of the cache.
In order to make room for the new entry on a cache miss, the cache may have to evict one of the existing entries. The heuristic that it uses to choose the entry to evict is called the replacement policy. The fundamental problem with any replacement policy is that it must predict which existing cache entry is least likely to be used in the future. Predicting the future is difficult, so there is no perfect way to choose among the variety of replacement policies available.
One such cache replacement algorithm (cache replacement policy) is Belady's algorithm. The most efficient caching algorithm would be to always discard the information that will not be needed for the longest time in the future. This optimal result is referred to as Belady's optimal algorithm. However, the implementation of Belady's optimal algorithm is impractical in that it looks into the future to identify the cache line that will be reused furthest in the future.
As a result, existing replacement policies use heuristics, such as Least Recently Used (LRU) or Most Recently Used (MRU), which each work well for different workloads. However, such existing replacement policies cannot exploit all forms of reuse, such as short-term reuse, medium-term reuse and long-term reuse, whereas, Belady's optimal algorithm can effectively exploit all three forms of reuse.
Furthermore, the performance of recent policies for the SPEC CPU2006 (suite of benchmark applications designed to test the CPU performance) indicate that there remains a significant gap between the best current policy and the Belady's optimal algorithm policy.
As a result, a better replacement policy needs to be implemented that is not simply based on any heuristic that is geared towards a particular class of access patterns. Instead, the replacement policy should apply Belady's optimal algorithm to better inform future cache replacement decisions.