Most computer systems employ a multilevel hierarchy of memory systems, with relatively fast, expensive, limited-capacity memory at the highest level of the hierarchy (closest to the processor) and proceeding to relatively slower, lower cost, higher-capacity memory at the lowest level of the hierarchy (typically relatively far from the processor). Typically, the hierarchy includes a small fast memory called a cache, either physically integrated within a processor integrated circuit or mounted physically close to the processor for speed. There may be separate instruction caches and data caches. There may be multiple levels of caches. An item that is fetched from a lower level in the memory hierarchy typically evicts (replaces) an item from the cache. The selection of which item to evict may be determined by a replacement method.
The goal of a memory hierarchy is to reduce the average memory access time. A memory hierarchy is cost effective only if a high percentage of items requested from memory are present in the highest levels of the hierarchy (the levels with the shortest latency) when requested. If a processor requests an item from a cache and the item is present in the cache, the event is called a cache hit. If a processor requests an item from a cache and the item is not present in the cache, the event is called a cache miss. In the event of a cache miss, the requested item is retrieved from a lower level (longer latency) of the memory hierarchy. This may have a significant impact on performance. The average memory access time may be reduced by improving the cache hit/miss ratio, reducing the time penalty for a miss, and reducing the time required for a hit.
If a cache stores an entire line address along with the data and any line can be placed anywhere in the cache, the cache is said to be fully associative. However, for a large cache in which any line can be placed anywhere, the hardware required to rapidly determine if an entry is in the cache (and where) may be very large and expensive. For large caches, a faster, space saving alternative is to use a subset of an address (called an index) to designate a line position within the cache, and then store the remaining set of more significant bits of each physical address (called a tag) along with the data. In a cache with indexing, an item with a particular address can be placed only at the one place (set of lines) within the cache designated by the index. If the cache is arranged so that the index for a given address maps to exactly one line in the subset, the cache is said to be direct mapped. In general, large direct mapped caches can have a shorter access time for a cache hit relative to associative caches of the same size. However, direct mapped caches have a higher probability of cache misses relative to associative caches of the same size because many lines of memory map to each available space in the direct mapped cache. If the index maps to more than one line in the subset, the cache is said to be set associative. All or part of an address is hashed to provide a set index which partitions the address space into sets. For a direct mapped cache, since each line can only be placed in one place, no method is required for replacement. In general, all caches other than direct mapped caches require a method for replacement. That is, when an index maps to more than one line of memory in a cache set, we must choose which line to replace.
In the event of a cache miss, typically one line in a cache is replaced by the newly requested line. In the case of a direct mapped cache, a new line replaces a line at one fixed place. In the case of fully associative caches, a replacement method is needed to decide which line in the cache is to be replaced. In the case of set associative caches, a replacement method is needed to decide which line in a set is replaced. The method for deciding which lines should be replaced in a fully associative or set associative cache is typically based on run-time historical data, such as which line is least-recently-used. Alternatively, a replacement method may be based on historical data regarding least-frequently-used. Still other alternatives include first-in first-out, and pseudo-random replacement.
The minimum amount of memory that can be transferred between a cache and a next lower level of the memory hierarchy is called a line, or block, or page. The present patent document uses the term “line,” but the invention is equally applicable to systems employing blocks or pages.
In some multilevel caches, each cache level has a copy of every line of memory residing in every cache level higher in the hierarchy (closer to the processor), a property called inclusion. For example, in an inclusive two-level cache system, every entry in the primary cache is also in the secondary cache. Typically, when a line is evicted from an upper level cache, the line is permitted to remain in lower level caches. Conversely, in order to maintain inclusion, if a line is evicted from a lower level cache, the lower level cache must issue a bus transaction, called a back-invalidate transaction, to flush any copies of the evicted line out of upper levels of the cache hierarchy. Each back-invalidate instruction causes any cache at a higher level in the hierarchy to invalidate its copy of the item corresponding to the address, and to provide a modified copy of the item to the lower level cache if the item has been modified. Back-invalidate transactions occur frequently and have a significant impact on overall performance due to increased bus utilization between the caches and increased bus monitoring (snoop) traffic.
Many computer systems employ multiple processors, each of which may have multiple levels of caches. All processors and caches may share a common main memory. A particular line may simultaneously exist in shared memory and in the cache hierarchies for multiple processors. All copies of a line in the caches must be identical, a property called coherency. However, in some cases the copy of a line in shared memory may be “stale” (not updated). If any processor changes the contents of a line, only the one changed copy is then valid, and all other copies must then be updated or invalidated. The protocols for maintaining coherence for multiple processors are called cache-coherence protocols. In some protocols, the status of a line of physical memory is kept in one location, called the directory. In other protocols, every cache that has a copy of a line of physical memory also has a copy of the sharing status of the line. When no centralized state is kept, all caches monitor or “snoop” a shared bus to determine whether or not they have a copy of a line that is requested on the bus.
In a snooping based system, the cache system monitors transactions on a bus. Some of the transactions indicate that an item has been evicted from an upper level of the cache system. However, some transactions may only “hint” that an item has been evicted from a high level of the cache system, but a low level of the cache does not know with complete certainty that the item is not still retained by a higher level. For example, some systems do not implement inclusion at the upper levels of the cache hierarchy. If the system does not implement inclusion at higher cache levels, then a third level cache may see that an item has been evicted from a second level cache, but the third level cache does not know whether a copy of the item is in the first level cache.