1. Field of the Invention
This invention relates to management of cache memories and more particularly to efficient implementation of a least recently used (LRU) replacement algorithm for cache memories.
2. Description of the Related Art
Cache memories provide local copies of portions of system memory in order to enhance processor performance. Data and/or instructions being used by the processor are accessed more quickly when they are in a cache than if they are in main system memory. Typical computer systems have multiple level of caches. The L1 cache is generally the smallest cache and located closest to the processor (typically on the same chip as the processor). The L2 cache is larger than the L1 cache but still significantly smaller than system memory.
Caches provide better performance because software typically operates with locality of reference, meaning that there is a tendency to access a relatively small or local area of memory. If such a local area is brought into the cache memory and the processor can run a program having that property, the program can run by accessing just that cache memory and thus run more efficiently. At some point however, a program operating on the processor will reference a memory location that is not in the cache and the system has to retrieve that memory location. In that way, new data is written into the cache memory. In addition, other functions in the system, such as an I/O device or another processor in a multi-processor system may also be utilizing system memory and may cause locations in the cache to become invalid.
There are different ways to map the system memory into the cache. One common approach utilizes an N-Way set associative cache, in which the cache is segmented into sets where each set contains N cache lines, where N is typically, 2, 4, 8, etc. A cache line is a sequential group of bytes, e.g. 32 or 64. Transactions for cache memories are typically in cache lines rather than in single bytes for efficiency purposes. Different blocks of main memory are assigned to one of the sets of cache lines and thus can be cached in any one of those N locations within that set. Thus, within each set the cache is associative. More memory addresses are assigned to that set than in a direct mapped cache where each address maps to only one cache line.
One important aspect of cache management is what data to include in the cache and when to update it. It has been shown in many performance tests that the least-recently used (LRU) cache replacement algorithm provides better average performance than other algorithms, such as random replacement. In the least recently used approach, the cache line that is oldest in the set is replaced when a new cache line is loaded into the set on the assumption there is a higher likelihood that more recently used cache lines are more likely to be used again rather than older cache lines based on the locality of reference in programs. In order to determine the least recently used (LRU) cache line in an N-way set associate cache, conventional approaches require a significant amount of complex hardware including counters and N-way multiplexers to implement the LRU algorithm. Additionally, status bits are required for each cache entry to track the usage of each entry. When a new entry is made in the set, the status bits need to be scanned to determine which of the cache lines is the least recently used or invalid to determine the appropriate cache line entry to evict to make room for the new entry.
It would be desirable to have an LRU replacement implementation that is less costly than the conventional approach and is logically simple. Specifically, it would be desirable to have an LRU circuit that is faster because it has fewer gates, no counters or N-way multiplexers, and can scale up to support bigger ways (i.e., where N is bigger) without a significant impact on circuit complexity. Further, it would be desirable to be able to store a new cache line in the set without the need for scanning to select an invalid entry over a valid entry for replacement.
Accordingly, the invention implements a least recently used (LRU) cache replacement algorithm utilizing pointers. In one embodiment the invention provides a method for implementing a least recently used (LRU) cache replacement that maintains a set of N pointer registers that point to respective ways of an N-way set of memory blocks. One of the pointer registers is an LRU pointer, pointing to a least recently used way and another of the pointer registers is a most recently used (MRU) pointer, pointing to a most recently used way. For a cache fill operation in which a new memory block is written into one of the N ways, the new memory block is written into the way (wayn), pointed to by the LRU pointer. All the pointers except the MRU pointer are promoted to point to a way pointed to by respective newer neighboring pointers, the newer neighboring pointers being neighbors towards the MRU pointer. The MRU pointer is updated to point to wayn, in which the new memory block was written.
The method may further include, for a cache hit in which one of the memory blocks in the set, waym, is accessed for a write or read operation, promoting the pointers waym and newer, except for the MRU pointer, to point to a way pointed to by a newer nearest neighboring pointer and pointing the MRU pointer to waym.
The method may further include, for an invalidate operation in which one of the ways, wayk is invalidated, demoting all pointers from the pointer pointing to wayk and older but not the LRU pointer and pointing the LRU pointer to the invalidated way.
In another embodiment, the invention provides an integrated circuit having an LRU cache control circuit incorporated therein for one set of N cache lines in an N-way set associative cache. The cache control circuit includes N registers, each register being a way pointer and containing a way pointer value pointing to a respective one of the N cache lines or ways. A predetermined one of the N registers is an LRU way pointer, pointing to a least recently used way, another predetermined one of the N registers is an MRU pointer pointing to the most recently used cache line in the set and the remaining registers are intermediate way pointers pointing to intermediate ways, each of the intermediate way pointers pointing to successively more recently used ways (assuming all the ways are valid), as the intermediate pointers go from the LRU pointer towards the MRU pointer. The cache control circuit also includes a plurality of selector circuits coupled to provide a next pointer value for each of the registers. Each selector circuit for the intermediate way pointers selects either a newer neighbor, an older neighbor or an initial value as the next intermediate pointer value. An MRU selector circuit for the MRU pointer selects either an older neighbor, an initial value for the MRU pointer, a current way hit value or a value of the LRU pointer, as a next MRU pointer value. An LRU selector circuit for the LRU pointer register selects either a newer nearest neighbor, an initial value, or the current way hit for a current operation as a next LRU pointer value.