This invention relates generally to the field of high performance processors that require a large bandwidth to communicate with a main memory system. To effectively increase the memory bandwidth, a cache memory system is typically placed between the processor and the main memory. The cache memory system stores frequently used instructions and data in order to provide fast access from the main memory.
In order for a processor to access memory, it checks the cache first. If the desired data is in the cache, a cache hit occurs, and the processor receives the data without further delay. If the data is not in the cache, a cache miss occurs, and the data must be retrieved from the main memory to be stored in the cache for future use. Main memory accesses take longer than cache accesses, so the processor is stalled in a cache miss, wasting a number of cycles. Thus, the goal for nearly all modern computer systems is to service all memory references from the cache and to minimize references which require accesses from the main memory.
In a typical cache system, a portion of a main memory address is used to index a location or a set of locations in cache memory. In addition to storing a block (or line) of data at that indexed location, cache memory stores one or more tags, taken from another portion of the main memory address, which identify the location in main memory from which the block of data held in cache was taken.
Caches are typically characterized by their size (i.e., amount of memory available for storage), their replacement algorithm (i.e., method of inserting and discarding blocks of data into a set), their degree of associativity or set size (i.e., number of tags associated with an index and thus the number of cache locations where data may be located), and their block or line size (i.e., number of data words associated with a tag). These characteristics influence many performance parameters such as the amount of silicon required to implement the cache, the cache access time, and the cache miss rate.
One type of a cache that is frequently used with modern processors is a direct-mapped cache. In a direct-mapped cache, each set contains only one data block and tag. Thus, only one address comparison is needed to determine whether the requested data is in the cache. The direct-mapped cache is simple, easy to design, and requires less chip area. However, the direct-mapped cache is not without drawbacks. Because the direct-mapped cache allows only one data block to reside in the cache set, its miss rate tends to be very high. However, the higher miss rate of the direct-mapped cache is mitigated by a small hit access time.
Another type of a cache that is frequently used is a d-way, set associative cache. A d-way, set associative cache contains S sets of d distinct blocks of data that are accessed by addresses with common index fields that have different tag fields. For each cache index, there are several block locations allowed, one in each set. Thus, a block of data arriving from the main memory can go into a particular block location of any set. The d-way set associative cache has a higher hit rate than the direct-mapped cache. However, its hit access time is also higher because an associative search is required during each reference, followed by a multiplexing of the data block to the processor.
Currently, the trend among computer designers is to use direct-mapped caches rather than d-way set associative caches. However, as mentioned previously, a major problem associated with direct-mapped caches is the large number of misses that occur. One particular type of miss that occurs is a conflict miss. A conflict miss occurs when two addresses map into the same cache set. This situation occurs when the addresses have identical index fields but different tags. Therefore, the addresses reference the same set. A d-way set associative cache typically does not suffer from conflict misses because the data can co-reside in a set. Although other types of misses, such as compulsory (misses that occur when loading a working set into a cache) and capacity (misses that occur when the cache is full and when the working set is larger than the cache size) do occur, they tend to be minimal as compared to conflict misses.
The problem of conflict misses has caused designers to reconsider using a direct-mapped cache and to begin designing cache memory systems that can incorporate the advantages of both the direct-mapped cache and the d-way associative cache. One approach has been to use a victim cache. A victim cache is a small, fully associative cache that provides some extra cache lines for data removed from the direct-mapped cache due to misses. Thus, for a reference stream of conflicting addresses a.sub.i, a.sub.j, a.sub.i, a.sub.j, . . . , the second reference a.sub.j misses and forces the data i indexed by a.sub.i out of the set. The data i that is forced out is placed in the victim cache. Thus, the third reference address, a.sub.i, does not require accessing main memory because the data is in the victim cache and can be accessed therefrom.
However, there are several drawbacks to the victim cache. For example, the victim cache must be very large to attain adequate performance because it must store all conflicting data blocks. Another problem with the victim cache is that it requires at least two access times to fetch a conflicting datum (i.e., one to check the primary cache, the second to check the victim cache, and maybe a possible third to store the datum in the primary cache). Still another drawback to the victim cache is that performance is degraded as the size of the cache memory is increased because the victim cache becomes smaller relative to the cache memory, thereby reducing the probability of resolving conflicts.
Consequently, there is a need for an improved cache memory system that incorporates the low conflict miss rate of the d-way set-associative cache, maintains the critical access path of the direct-mapped cache, and has better performance than the victim cache.