Field of the Invention
This invention relates generally to the field of computer processors and software. More particularly, the invention relates to an apparatus and method for a shared LRU policy between multiple cache levels.
Description of the Related Art
One reason that hierarchical caches are used is to enable the use of small caches that are closer to the processor (e.g., such as the Level 1 cache (L1)) to handle requests at a faster rate than subsequent larger levels of the cache hierarchy (e.g., such as the Level 2 (L2)). It is therefore impractical to advise the L2 cache of accesses served in the L1 cache, as these accesses are occurring at a faster rate than the L2 cache can handle. Generalizing, this principle occurs between any two levels in a cache hierarchy (e.g., adjacent cache levels Ln and L(n+1) such as described below).
This leads to a difficulty for the Least Recently Used (LRU) mechanism of the L2 cache. The L2 cache is are not aware of processor access to data at addresses held in the L1 cache, and in cases where the L1 cache consistently hits certain addresses, the L2 cache will not see any requests to those addresses. Thus, data that is most frequently accessed by the processor is actually most likely to be evicted from the L2 and lower levels of cache. If subsequently, the data is momentarily evicted from the L1 cache, any later attempt to use it will result in the penalty of a fetch from main memory.
A common way of solving this problem is the make the L1 cache somewhat “leaky”—for example, by occasionally evicting the most-recently-used (MRU) entry or by sending a small proportion of requests to the L1 cache down to the L2 cache even though the L1 has serviced them.
Another approach is to operate the caches in a form of “exclusive mode,” for example, that all data in the L1 cache is automatically removed from the L2 cache. Likewise, when data is evicted (even unmodified) from the L1 cache, it is returned to the L2 cache as MRU data.
The same problem applies in certain implementations of a single cache controller, such as those in which all cache data is held in dynamic random access memory (DRAM), but only a proportion of the cache Metadata (Tags, valid indications, LRU indications, etc) are held on-die (with the remaining Metadata also residing in DRAM). It is possible to consider this arrangement as the same as two levels of cache (e.g., with a 1:4 capacity ratio and a 2:1 speed ratio). The problem remains that it is impractical from a power and bandwidth basis to update LRU details (for example, to mark the accessed entry as most recently used and to age the LRU of non-accessed entries of the same Set) in the DRAM-held Metadata (i.e., the L2) when accesses are served from on-die Metadata (i.e., the L1). The close (e.g., 1:4) capacity ratio of the two cache levels also causes a problem; it can reasonably expected that some workloads will fit entirely within the on-die Metadata, which can result in their DRAM Metadata rapidly reaching a least recently used status (e.g., assigned the highest LRU value among entries in the Set and therefore becoming eligible for eviction).