1. Field of the Invention
This invention relates to caches and, more particularly, to a least-recently-used cache replacement system.
2. Description of the Related Art
Since main system memory is typically designed for density rather than speed, microprocessor designers have added caches to their designs to reduce the microprocessor""s need to directly access main memory. A cache is a small memory that is more quickly accessible than the main memory. Computer systems may have a number of different levels of caches. For example, a computer system may have a xe2x80x9clevel onexe2x80x9d (L1) cache and a xe2x80x9clevel twoxe2x80x9d (L2) cache. These caches are typically integrated with the microprocessor. Caches are typically constructed of fast memory cells such as static random access memories (SRAMs) which have faster access times than the memories used for the main system memory (typically dynamic random access memories (DRAMs) or synchronous dynamic random access memories (SDRAMs)). The faster SRAMs are not typically used for main system memory because of their low density and high cost.
Many other types of caching are also possible. For example, the main system memory may act as a cache for the system""s slower direct access storage devices (e.g., hard disk drives). Other devices, such as hard drives, may also include internal caches to improve their performance.
When a microprocessor needs data from memory, it typically first checks its L1 cache to see if the required data has been cached. If not, the L2 cache is checked. At the same time, the data may be requested from memory, in case there is a miss in the L2 cache. If the L2 cache is storing the data, it provides the data to the microprocessor (typically at much higher rate and lower latency than the main system memory is capable of), and if the data was requested from memory, that request may be cancelled. If the data is not cached in the L1 or L2 caches (referred to as a xe2x80x9ccache missxe2x80x9d), the data is read from main system memory or some type of mass storage device (e.g., a hard disk drive). Relative to accessing the data from the L1 cache, accesses to memory take many more clock cycles. Similarly, if the data is not in the main system memory, accessing the data from a mass storage device takes even more cycles.
Caches typically operate on the principal of locality of reference, which states that the data most recently used (and the data in that locality) is more likely to be accessed than the rest of the data. This principle holds because computer software typically has loops and branches that cause previously executed code to be re-executed. By storing recently accessed instructions and data in a cache, system performance may be increased because the microprocessor need not wait for the instructions and data to be read from main memory.
Microprocessor and computer system architects have taken the locality of reference principle one step further by using techniques such as, branch prediction to proactively store instructions and data in the cache before they are actually needed by the microprocessor. In addition, when an instruction or byte of data is read from memory, additional bytes following the instruction or data are read and cached. Once again, the principal of locality of reference dictates that these instruction and data bytes are more likely to be needed by the processor than the other data or instructions at large.
There are several different ways to map the system memory into the cache. One common approach utilizes an n-Way set-associative cache, in which the cache is segmented into sets. Each set contains n cache lines. A cache line is a sequential group of bytes (e.g., 32 or 64). For efficiency purposes, cache memory transactions are typically in cache lines rather than in single bytes. Cacheable locations in main memory may each be assigned to one of the sets of cache lines. As a result, each location may be cached in any one of the n locations within its assigned set. One special case of the n-Way set-associative cache is the direct-mapped cache. In a direct-mapped cache, n=1, and thus each memory location maps to only one location in the cache. Another special case of the n-Way set-associative cache is the fully associative cache. In this case, n=m, where m is the number of lines in the cache (and thus there is only one xe2x80x9csetxe2x80x9d). In this case, each memory location may map to any of the cache locations.
Two basic performance criteria for caches are hit ratio (i.e., the ratio of the memory accesses that are found in the cache to the total number of memory accesses) and search speed (i.e., how quickly a hit or miss determination can be made). In a direct-mapped cache, search speed is optimized at the cost of hit ratio. This is because it is relatively easy to determine hits/misses (since a memory location only maps to one cache line, only that line needs to be checked) but more difficult to have a high hit ratio since multiple memory locations map to a single cache line. Conversely, fully-associative caches optimize hit ratios while sacrificing search speed. Allowing all memory locations to map to any cache line improves the probability that there will be a hit but greatly increases the complexity of searches since all cache lines must be searched for each memory location. Set-associative caches attempt to compromise between the two by offering more associativity (and thus higher hit ratios) than direct-mapped caches while also offering faster search speeds than fully-associative caches.
Since cache size is limited by a number of factors (including die size, power consumption, and cost), care must be taken when loading information into the cache. Once particular area of concern for the designer arises when deciding a policy for overwriting or invalidating existing instructions and data in a cache to make room for new instructions and data. Thus, in set-associative caches where n greater than 1 (and thus there are choices as to which line to cache a particular memory location), there needs to be some way to choose which of the possible cache lines to fill with new data. A common solution is to track the relative order of access to each cached memory location and then replace the least recently used instructions or data with new instructions or data. This solution is based on the principle that recently accessed cache lines are more likely to be accessed again. Other solutions include random replacement and first-in first-out techniques.
On average, least-recently used (LRU) cache replacement algorithms provide better performance than other algorithms. However, in order to determine the least recently used (LRU) cache line in an n-way set associative cache, conventional approaches require a significant amount of complex hardware, including counters and n-way multiplexers, to implement the LRU algorithm. Additionally, status bits for each cache entry track the usage of each entry. The number of status bits required to implement a conventional LRU algorithm for a set with n ways is typically n*log2(n). When a new entry is made in the set, the status bits are scanned to determine which of the cache lines is the least recently used or invalid. The least recently used or invalid line is then evicted to make room for the new entry. Drawbacks of a conventional LRU replacement algorithm include the amount of hardware and number of status bits time required to implement the algorithm as well as the time and hardware required to scan for invalid entries in the set.
Various embodiments of methods and systems for implementing a least recently used (LRU) cache replacement algorithm are disclosed. In a first embodiment, a computer system that includes a processor, a system memory, and an N-way set-associative cache coupled to the processor is disclosed. The N-way set-associative cache includes a memory that is logically divided into at least one set. Each set includes N ways (i.e., cache lines) that are each configured to store a line (e.g., a data or instruction line) from the system memory. For example, if there is only one set, N may be the number of lines in the memory and the cache will be fully-associative. Each set is further partitioned into several sectors that each include two or more of the N ways. The computer system may also include multiple usage status indicators that indicate the relative usage status of the sectors in the set. For example, in one embodiment, the computer system may include multiple pointer registers that each store a pointer value identifying one of the sectors in an associated set. The pointer registers may each have an associated usage status. Thus, if a first sector is the LRU sector in the set, the LRU pointer register may store a pointer value identifying that first sector. Similarly, a most recently used (MRU) pointer register may store a pointer value identifying the MRU sector in the set. The cache controller for the N-way set-associative cache is coupled to the pointer registers and may be configured to perform a fill in the set associated with the pointer registers by reading the value stored in the LRU pointer register and filling one of the ways within the LRU sector.
In one embodiment, each of the sectors may contain two ways. Since a single bit may uniquely identify each of the ways in each sector, each of the pointer registers may be configured to store a bit identifying which of the ways in one of the sectors is a LRU way. In these embodiments, the cache controller may be configured to perform a fill by reading the bit stored in the LRU pointer register and filling the LRU way within the LRU sector.
There may be the same number of pointer registers as there are sectors in the associated set. In such an embodiment, the cache controller may assign and/or update the pointer values so that only one pointer value stored in any of the pointer registers identifies a particular one of the sectors at any given time. Several comparators may be configured to compare the pointer values stored in the pointer registers to a sector identifier identifying a sector being operated upon. The cache controller may be configured to detect an error if more than one of the pointer values equals the sector identifier.
In some embodiments, the cache controller may be configured to update the pointer values stored in the pointer registers in response to an operation being performed on one of the ways in the first set. For example, if the operation affects the relative usage status of some of the sectors in the first set, the cache controller may update the pointer values so that the pointers consistently identify the relative usage status of the sectors in the set. Thus, if the operation is an invalidation, the cache controller may update the pointer values by updating the LRU pointer value to identify a sector containing a way invalidated by the operation and storing a pointer value stored in a next lower usage status pointer register in each non-LRU pointer register that indicates usage status lower than or equal to a pre-operation usage status of the sector containing the invalidated way. Alternately, if the operation results in a hit and, as a result of the hit, a hit way is not invalidated, the cache controller may update the pointer values by updating the MRU pointer value to identify a sector containing the hit way and storing a pointer value stored in a next higher usage status pointer register in each pointer register that indicates usage status higher than or equal to a pre-operation usage status of the sector containing the hit way. Or, if the operation was a fill, the cache controller may update the pointer values by updating the MRU pointer value to identify the pre-operation LRU sector and storing a pointer value stored in a next higher usage status pointer register in each non-MRU pointer register.
In other embodiments, a cache subsystem is disclosed. This cache subsystem may be used in a computer system, as described above, or in any other suitable system. For example, the cache subsystem may be used as a cache for a disk drive or storage array. The cache subsystem may include multiple multiplexers that are each configured to output a value to be loaded into a pointer register so that the pointer values can be updated.
In another embodiment, a method of filling an N-way set-associative cache is disclosed. The method includes identifying a least recently used (LRU) sector within a set. The set is partitioned into sectors, and each sector includes several ways. The LRU sector is identified using one or more usage status indicators that show the relative usage status of the sectors in the set. For example, multiple usage status pointers may correspond to the set. Identifying a LRU sector within the set may involve determining which sector the LRU usage status pointer points to, because the LRU usage status pointer points to the LRU sector in the set. The method may also involve storing the data line in one of a plurality of ways within the LRU sector, and, if storing the data line causes a change in usage status for any of the sectors in the set, updating one or more of the usage status pointers corresponding to the set. In some embodiments, the method may also involve identifying the LRU way within the LRU sector so that a more accurate LRU replacement algorithm may be implemented. For example, if each sector includes two ways, the LRU way in a sector may be identified by the least significant bit of the usage status pointer currently identifying that sector.
Yet another embodiment discloses a method of performing cache line operations upon an N-way set associative cache. The cache comprises one or more sets, and each set comprises N cache lines. The method includes performing an operation on a first cache line within a first sector within a first set. The first sector includes at least two of the N cache lines in the first set. In response to performing the operation, several pointers that are configured to point to the sectors within the set may be updated. For example, if the operation was a fill and the first sector comprises only valid cache lines after the fill, the MRU pointer may be updated to point to the first sector and the other pointers may each be updated to point to a sector that was pointed to by a next higher usage status pointer before the fill.