This invention relates in general to memory architectures for computer systems and, more particularly, to high performance cache memories for use with computer processors.
Processors often take several clock cycles to access data which is stored in a main memory located external to the processor. Not only do these external memory accesses require a significant amount of time, these accesses also consume a significant amount of power. Cache memories have often been used to enhance computer system performance by providing a processor with a relatively small, high speed memory (or cache) for storing instructions and data which have recently been accessed by the processor. These instructions and data are stored in the cache in the hope that, since they have been accessed once, they will be accessed again relatively soon. The speed or access time of the cache memory is substantially faster than that of the external main memory. By retrieving an instruction or data from the cache when a cache hit occurs rather than accessing the slower external main memory, significant time can be saved in the retrieval of the desired information.
A recent trend has been to integrate a first level (L1) cache on the microprocessor chip together with the microprocessor core as shown in FIG. 1. In this particular example, the microprocessor chip has been provided with a level 1 cache (L1) located on the chip and a level 2 cache (L2) located external to the microprocessor chip. The on-chip L1 cache includes both an L1 instruction cache and an L1 data cache. The L1 caches and the L2 cache are coupled via physical address and physical data buses to the external main memory in this example. The off-chip L2 cache is typically orders of magnitude larger than the on-chip L1 caches. For example, 4 Kbyte on-chip L1 caches and 256 Kbyte-512 Kbyte off-chip L2 external caches are common.
In a typical cache arrangement, the second level L2 cache includes all first level cache entries as subsets. In other words, all of the entries of the first level L1 caches are also stored in the second level L2 cache. In this manner, accesses to the L2 cache do not have to inspect the L1 caches unless there is an indication that the requested instruction or data is also stored in the L1 caches.
Both "direct mapped" and "associative" caches are known to increase memory performance. In direct mapped caches, a particular block or line of information can only be stored in a single location in the cache according to the block-frame address of the block or line. In a "fully associative" cache, the block can be placed anywhere within the cache, whereas in a "set-associative" cache the block is restricted to be stored in certain sets of storage locations. In a 2-way set associative cache, each set in the cache can store 2 blocks of information. In a 4-way set associative cache, each set in the cache can store 4 blocks of information. Cache performance generally increases with increased associativity. However, increased associativity tends to require caching circuits of increased complexity.
A "miss cache" is described by Norman P. Jouppi in his publication entitled "Improving Direct-Mapped Cache Performance By The Addition Of A Small Fully-Associative Cache And Prefetch Buffers", IEEE Seventh Annual Symposium On Computer Architecture, 1990. The described miss cache is a small, fully associative cache which is located between a first level direct-mapped cache and its refill path. If a miss occurs in the direct-mapped cache but a hit occurs in the miss cache, then significant time is saved by avoiding an access to main memory. Such miss caches are typically very small and hold 2-5 entries or blocks in one example.
Jouppi also describes an improvement to miss-caching, namely the "victim cache". A victim cache is a small, fully associative cache as described with reference to the miss cache, except that the small fully associative cache (victim cache) is loaded with the victim of the miss instead of the requested block. In other words, when a cache miss occurs in the direct mapped L1 cache, the block or "victim" that is discarded from the L1 cache is stored in the victim cache. The victim caches described by Jouppi typically hold 1-5 entries. The goal of Jouppi's victim cache is to increase the performance of a direct mapped first level cache to a level approximating the performance of a set associative cache by the addition of a small (1-5 entry) fully associative victim cache. The victim cache contains only entries that have recently been kicked out of the direct mapped first level cache. From the above it is seen that the goal of the Jouppi victim cache is to increase the performance of a direct mapped cache.
With the ever increasing demand on memory for faster access which is caused by processors with higher clock speeds and larger appetites for instructions and data, even faster cache memory systems than those presently available are clearly desirable.