1. Technical Field
The present invention relates generally to computer processing systems and, in particular, to methods for caching cache tags.
2. Background Description
A cache memory (hereinafter "cache") is a small, fast, redundant memory used to store the most frequency accessed parts of the main memory of a computer processing system. In the memory hierarchy of modern computer processing systems, cache memory is generally located immediately below the highest level, namely the central processing unit (CPU) registers. The cache can be divided into multiple distinct levels, with most current systems having between one and three levels of cache. Those three levels are generally referred to as L1, L2, and L3. Some of the levels may be on the same chip as the CPU (i.e., on the same semiconductor substrate), or entirely separate from the CPU. For example, an L1 cache is built into or packaged within the CPU chip (and is thus referred to as "on-chip").
Generally, each cache includes two conventional memories: a data memory and a tag memory (also referred to as a "directory" memory). Fixed-size regions of main memory referred to as cache lines are stored in the data memory of the cache. The address of each cache line contained in the data memory is stored in the directory memory, as well as other information (state information), including the fact that a valid cache line is present. A congruence class refers to a group of T cache lines where a particular address is allowed to reside.
As stated above, the cache is intended to hold the most active portions of the main memory. Accordingly, the computer hardware dynamically allocates parts of the cache for portions of the main memory deemed most likely to be accessed soon. Many caches are associative (also referred to as content-addressable). In an associative memory, the address of a memory location is stored along with its content. When a request for data is made by the CPU, instead of reading the data directly from a main memory location, the cache is provided with an address and responds by providing data which may or may not be the requested data. If the requested data (e.g., a word) is found in the cache, then the request is referred to as a "hit". On the other hand, if the requested data is not found, then the request is referred to as a "miss". When a cache miss occurs, the main memory is accessed and the cache is updated to include the new (correct) data. Data is updated in the cache by the hardware copying cache lines which include the requested data.
When a cache line is copied into the cache, it is likely that another cache line must be removed from the cache. A least-recently-used (LRU) policy is often the basis for the choice. Conversely, a most-recently-used (MRU) is often the basis for selecting which cache lines are copied into the cache.
Cache coherence (or cache consistency) refers to the process employed by a processor to manage multiple copies of data (e.g., a particular cache line) residing in multiple caches. Such data management is required to prevent the data from being lost or overwritten. Many modern computer processing systems implement some form of multiple-reader-single-write protocol to achieve cache coherence.
The performance of a memory system can be measured to some degree by performance parameters such as latency and bandwidth. The term "latency" refers to the delay from when the processor first requests a word from memory until that word is available for use by the processor. That is, the latency of a memory request is the time period required for the memory system to produce the result of the request to the processor. The term bandwidth refers to the rate at which information can be transferred from the memory system. That is, the bandwidth of a memory system is the rate at which the memory system can accept memory requests and produce the results of the requests.
With respect to large caches (for example, 64MB), one of the problems of supporting the same is the large amount of directory space area required to track the contents of the data arrays. As a result, there is pressure on cache designers to increase the line size of large caches. It is well known that, with a fixed capacity, increasing the cache line size reduces the required directory space. For example, each doubling in line size removes one bit from each tag. Unfortunately, this can result in poor performance if those cache lines are shared.
The standard way of supporting large caches with small directories, that is, increasing the cache line size, results in an inefficient use of the cache. Since cache lines are typically only 1/2 utilized, much of the cache line is brought into the cache but never used. To reduce the resulting negative effects on memory bandwidth, the lines are typically "sectored" to permit only those sections of the cache line that are actually required to reside in the cache; the unrequired sections need not be cached. However, since many of the sectors in a sectored cache are empty, the cache designers could typically have gotten by with a much smaller cache with shorter cache lines. Since smaller caches are typically faster than larger caches due to physical constraints, the benefits of the larger caches are reduced by increasing the line size and adding sectoring-bits. Large caches with long cache lines are slower, inefficiently used, and can require more memory bandwidth to maintain each line. Thus, it would be desirable and highly advantageous to have a method that allows the use of a large cache, and provides fast access to cached data.