1. Field of the Invention
The present invention generally relates to the field of cache memory in computer systems, more specifically to an improved method and apparatus for determining which line to replace during cache line replacement in an inclusive set-associative cache memory system.
2. Description of the Related Art
Computer systems generally consist of one or more processors that execute program instructions stored within a memory medium. This medium is most often constructed of the lowest cost per bit, yet slowest storage technology. To increase the processor performance, a higher speed, yet smaller and more costly memory, known as a cache memory, is placed between the processor and final storage to provide temporary storage of recent/and or frequently referenced information. As the difference between processor speed and access time of the final storage increases, more levels of cache memory are provided, each level backing the previous level to form a storage hierarchy. Each level of the cache is managed to maintain the information most useful to the processor. Often more than one cache memory will be employed at the same hierarchy level, for example when an independent cache is employed for each processor.
Typically only large “mainframe” computers employ memory hierarchies greater than three levels. However, systems are now being created using commodity microprocessors that benefit greatly from a third level of cache in the memory hierarchy. This level is best positioned between the processor bus and the main memory. Since it is shared by all processors and in some cases the I/O system too, it is called a shared cache. Each level of memory requires several times more storage than the level it backs to be performance effective. Therefore, the shared cache requires several tens of megabytes of memory. To remain cost effective, the shared cache is implemented using relatively low cost Dynamic Random Access Memory (DRAM), yet at the highest performance available. This type of shared cache is typically accessed at a bandwidth that involves lengthy transfer periods, at least ten times that which is typical of other caches, to and from the main memory.
Caches have evolved into quite varied and sophisticated structures, but always they address the tradeoff between speed and both cost and complexity, while functioning to make the most useful information available to the processor as efficiently as possible. Since a cache is smaller than the next level of memory in the hierarchy below, it must be continuously updated to contain only information deemed useful to the processors.
There are two major types of cache organization: direct-mapped and set-associative. Direct-mapped caches are characterized by a one-to-one mapping from system address to cache address. This mapping can be as simple as using the lower n address bits of the system address as the cache address. Set-associative caches are characterized by a one-to-many mapping from system address to cache address. For example, in a four-way set-associative cache, the data corresponding to the system address can be found in one of four locations in the cache. There is a direct mapping from system address to set address but the tag, usually a subset of the upper system address bits, must be compared with the tags of each of the four ways of the set to determine which way contains the data. Which of the four possible addresses in the cache to select for a line which is a miss is based on a cache line replacement policy. The most widely used replacement policy is called least recently used (LRU). The idea behind LRU replacement is to replace the line that has been least recently used in the set. Accordingly, storage is required for each set in order to record how recently a line has been used. So, unlike direct-mapped caches, set-associative caches require extra storage, called a directory, to store address tags and replacement policy information such as LRU status. The higher hit rates of set-associative caches are usually worth the extra cost of having a directory.
A further property of cache hierarchies is called inclusion. In an inclusive cache hierarchy, every cache contains a subset of the data of the caches below it in the hierarchy. Cache levels above a given level are assumed to be closer to the processor whereas cache levels below a given level are assumed to be farther from the processor. Inclusive hierarchies allow cache coherence traffic to be filtered out at lower levels in the hierarchy and do not require they be propagated up to the highest level of the hierarchy. The filtering of cache coherence traffic helps improve memory system performance.
The locking of cache lines (that is pinning, or making lines not available for replacement) is known in the prior art. Examples include (1) U.S. Pat. No. 5,353,425, (2) U.S. Pat. No. 5,913,224, and (3) U.S. Pat. No. 6,047,358 each of which is incorporated herein by reference. However, prior art cache lines are locked under software control (as in the first and third references), or as part of system configuration in order to optimize access to real time data (as in the second reference). In contrast, in the current invention described below the locking (or pinning) of cache lines is done automatically by the cache controller when it is detected that a recently replaced line is reloaded into the cache.
Also known in the prior art is the concept of not replacing a line in a lower level cache when it is present in a higher level cache; an example is U.S. Pat. No. 5,584,013 incorporated herein by reference. However, efficient implementation of such schemes requires that the controller for the lower level cache have access to the directory information for the higher level cache. In systems such as those described above in which there is a large shared cache, accessing higher level cache directories on every cache line replacement in the shared cache is impractical. It is therefore an object of the current invention to retain cache lines that are frequently accessed in higher level caches in such a fashion that access to the directories of the higher level caches is not required when selecting a line to replace.