1. Field of the Invention
Embodiments of the present invention relate to techniques for accessing cache memories. More specifically, embodiments of the present invention relate to techniques for identifying a least-recently-used “way” in a set-associative cache memory.
2. Related Art
Almost all computer systems include one or more cache memories, which are small, high-speed memories that store copies of frequently used data items and/or instructions. For example, a typical computer system includes an on-die L1 cache and an on-die or off-die L2 cache which provide two levels of a memory hierarchy that also includes a main memory (e.g., RAM) and a mass-storage device (e.g., a disk drive).
Generally, computer systems first access the L1 cache when attempting to access a given cache line. If the cache line is present in the L1 cache, the cache access is very fast. However, when an attempt to access the cache line misses in the L1 cache, the L1 cache forwards the request to the L2 cache (which may in turn forward the request to the main memory and possibly to the mass-storage device). Because each level of the memory hierarchy has a longer access time than the levels below it, avoiding unnecessary cache misses can improve computer system performance.
One possible way to avoid cache misses is to use a “fully-associative” cache, in which any entry can store cache lines from any address in the next higher level of memory hierarchy. Unfortunately, fully-associative caches are difficult to implement. Because fully associative caches are difficult to implement, computer systems typically use “set-associative” caches. For example, FIG. 1 presents a block diagram illustrating an exemplary set-associative cache 100. The entries in a set-associative cache 100 are logically divided into two or more “ways.” For example, in cache 100, the entries are logically divided into four ways. Each way in turn includes entries that are part of one or more “sets.” In FIG. 1, an exemplary set in cache 100 is delineated by a group of hash-marked entries. Set-associative caches are well-known in the art and hence are not described in more detail.
Some set-associative caches are “skewed-associative.” In a skewed-associative cache, each way in the cache is associated with a hash function that operates on a cache line address to determine which entry in the way is used for the cache line. Note that each hash function can produce a different result for the same cache line address. Consequently, a different entry may be identified for a given cache line address for each way in a skewed-associative cache.
When evicting an entry in a skewed-associative cache, some computer systems use a least-recently-used (LRU) technique to determine which entry is to be replaced. Generally, LRU techniques involve determining which way holds the least-recently-used entry for a cache line and then preferentially replacing the least-recently-used entry.
Each cache in a typical LRU system includes a table (or another data structure) that holds a record of the accesses to the ways within the cache. A given cache uses the table to determine the least-recently-used way in order to replace a cache line in that way. For example, FIG. 2 presents an exemplary LRU search table 200 for a two-way skewed-associative cache (in which the ways are designated “way[0]” and “way[1]”). In FIG. 2, each row in search table 200 represents an entry/index (“index[0]”) that can be generated by the hash function for way[0], whereas each column represents an entry/index (“index[1]”) that can be generated by the hash function for way[1]. Each table-entry includes a binary flag that indicates whether the least-recently-used entry is in way[0] or way[1]. (For clarity we only show a flag value for the row for index[0] and the column for index[1]).
Unfortunately, the number of entries in the search table for the above-described LRU replacement technique scales exponentially with the number of ways. Consequently, using the LRU replacement technique for a skewed-associative cache with more than a few ways becomes prohibitively expensive (in terms of area, speed, and complexity).
Hence, what is needed is an LRU mechanism without the above-described problems.