Most computer systems employ a multilevel hierarchy of memory systems, with fast but limited capacity memory at the highest level of the hierarchy and proceeding to slower but higher capacity memory at the lowest level of the hierarchy. Typically, the hierarchy includes a small fast memory called a cache, either physically integrated within a processor integrated circuit or mounted physically close to the processor for speed. There may be separate instruction caches and data caches. There may be multiple levels of caches.
A memory hierarchy is useful only if a high percentage of items requested from memory are present in the highest levels of the hierarchy when requested. If a processor requests an item from a cache and the item is present in the cache, the event is called a cache hit. If a processor requests an item from a cache and the item is not present in the cache, the event is called a cache miss. In the event of a cache miss, the requested item is retrieved from a lower level of the memory hierarchy. This may have a significant impact on performance. In general, minimization of cache misses and minimization of the effects of cache misses are some of the most important design parameters for overall computer system performance.
The minimum amount of memory that can be transferred between a cache and a next lower level of the memory hierarchy is called a line, or sometimes a block. Typically, a memory is organized into words (for example, 32 bits per word) and a line is typically multiple words (for example, 16 words per line). Memory may also be divided into pages, with many lines per page.
If a cache stores an entire line address along with the data, any line can be placed anywhere in the cache. A space saving alternative is as follows. Assume that a cache holds 128 lines. For 128 lines, seven bits may be used to designate a line position within the cache. If the least significant seven bits of the line address are used to designate a line within the cache, then only the remaining set of more significant bits of each physical address must be stored along with the data. The number used to designate a line within a cache is commonly called an index and the remaining set of bits required to define a physical address for a line is commonly called a tag.
In a cache with indexing, an item with a particular address can be placed only at the one place within the cache designated by the index. In addition, every item within the address space having identical index bits will potentially require the same line space within the cache. Therefore, a new line may be fetched that requires the same space in the cache as an existing line and the existing line may need to stay in the cache. This condition is called a conflict and is discussed in more detail below.
If a line can appear at only one place in the cache, the cache is said to be direct mapped (and is said to have low associativity). In an alternative design, a cache may be organized into sets, each set containing two or more lines. If a line can be placed in only one of the sets, the cache is said to be set associative. If a line can be placed anywhere in the cache, the cache is said to be fully associative. In general, caches having low associativity are simpler, faster and require less space than caches having high associativity. However, direct mapped or other low associativity caches may have performance problems due to conflicts as discussed below.
For a direct mapped cache or other low associativity cache, a new line may require the same space as an existing line. That is, instead of displacing lines randomly, or displacing the least recently used line, the new line displaces the line having the same index within the cache. The displaced line may be useful and may need to stay in the cache. A miss resulting from a useful line being displaced by a line having the same index is called a conflict miss. In some software, a second line may displace a first line, only to have the first line soon displace the second line. This thrashing of a single cache line can result in low system performance, even though the cache size is adequate for the particular software. There is need for the inherent speed and space advantages of low associativity caches while minimizing the negative effects on system performance due to conflict misses.
One approach to reducing the impact of conflict misses in direct mapped caches is to add a small fully associative secondary cache. For example, see Jouppi, N. P. "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers", Proceedings of the 17th Annual International Symposium On Computer Architecture, May 1990, pp 364-373 and see also U.S. Pat No. 5,261,066 (Jouppi et al). Jouppi proposes adding an additional small (2-5 lines) fully associative cache called a miss cache. In the event of a cache miss in the primary cache, the requested item is placed in both the primary cache and into the miss cache. Items placed in the miss cache replace the least recently used item. As an improvement, Jouppi proposes loading the small secondary cache with the displaced victim of a miss instead of the requested line, calling the resulting secondary cache a victim cache. In an alternative design, a small additional cache with first-in first-out replacement, called an assist cache, is disclosed in Kurpanek, G. et al, "PA7200: A PA-RISC Processor with Integrated High Performance MP Bus Interface", Digest of Papers Spring COMPCON 94, 1994, pp 373-382.
If there are multiple levels of caches, a lower level cache is typically larger than upper level caches and a lower level cache typically includes all the information that is in higher level caches. If the lower level cache includes all the information contained in higher level caches, the lower level cache is said to have inclusion. The primary advantage of inclusion is that when a check is needed to determine whether items in cache memory are the same as items elsewhere in the hierarchy (called coherency checking), only the lowest level cache needs to be checked. In general, for the systems with victim or assist caches described above, when an item is displaced from a lower level cache into the victim or assist cache, any corresponding item in a higher level cache is removed from the higher level cache. This impacts performance if the victim item is requested again, resulting in a cache miss for the higher level cache.
An alternative approach to reducing the impact of conflict misses in direct mapped caches is to monitor conflict miss address distribution and to remap memory paging. If the operating system has information on the cache conflict distribution it can remap pages that conflict with other pages into pages that have no such conflicts. This remapping can be static (determined before run time) or dynamic (changing based on conflict misses during software execution). The static approach assumes the operating system knows the number of pages mapped to a given cache line when deciding on the page's address translation (page mapping). For an example of the static approach, see Kessler, R. et al, "Page Placement Algorithms for Large Real-Indexed Caches", ACM Transactions on Computer Systems, Vol. 10, No. 4, Nov. 1992, pp 338-359. For an example of dynamic page remapping, see Bershad, B. et al, "Avoiding Conflict Misses Dynamically in Large Direct Mapped Caches", ASLOS VI Proceedings, Oct. 1994, pp 158-170. In Bershad et al, additional hardware is provided (called a Cache Miss Lookaside buffer) that detects and records a history of cache misses. Cache misses are detected on a per-page basis. Pages with many misses are remapped to different physical addresses. The Cache Miss Lookaside buffer stores associatively indexed page number tags from cache misses. The buffer also includes counters. If a cache miss is detected and the page number tag is already in the buffer the corresponding counter is incremented. If a cache miss is detected and the page number tag is not in the buffer, the least recently used address is displaced. When a counter exceeds a threshold, the buffer generates an interrupt and the operating system then remapps the page corresponding to the interrupting counter.
Victim caches and assist caches reduce conflict misses if the conflict misses occur within a relatively short time interval. For misses occurring at longer time intervals, the victim line may be displaced from the victim cache before it is required again by software. Dynamic page remapping reduces conflict misses but requires a relatively long time. The software context may change before remapping ever occurs. In addition, dynamic page remapping requires a large cache to provide many alternative target pages for remapping of high miss pages. Finally, in the specific implementation of Bershad et al, all cache misses are counted, even the ones that do not displace lines from the cache. There is a need for an improved cache system providing the speed and space benefits of a direct mapped cache, with reduction of short term effects of conflict misses as provided by small fully associative auxiliary caches, with the reduction of misses by dynamic page remapping and still other improvements.