Cache memories are small, high-speed memory stores that are frequently included in the central processing unit ("CPU") architectures of data processing systems. A data processing system typically has two caches: a small level one ("L1") cache usually integrated into the CPU design, and a comparatively larger level two ("L2") connected to the CPU via a memory bus. The L2 cache supplements the L1 cache.
The storage unit of a cache is called a line which can hold a consecutive segment of data in the memory. When a CPU uses a piece of data, the cache is searched for the line containing the data. If the line is already in the cache, the piece of data is sent immediately to the CPU, otherwise the whole line will be loaded from the main memory into the cache. By automatically maintaining recently used lines in the cache, an entire memory system of a data processing system can be made to appear as fast as the cache.
An important measure of the performance of a cache memory is the Buffer Hit Ratio ("BHR"): the percentage of memory accesses that are satisfied by the cache without having to access slower main memory. The higher the BHR, the better the cache performance. Cache performance depends on the application code being run. In particular, the better the code exhibits "spatial locality," that is, the more its references are to closely-spaced elements of its address space, the higher a BHR will be achieved.
Since a cache can contain thousands of lines, very often it is logically organized in a two-dimensional storage of rows and columns in order to reduce search time. In such a case, cache accesses are memory mapped. That is, a consecutive segment of data from the memory that makes up a cache line is assigned uniquely to a row and each row has its own independent logic for controlling the line replacement. These rows, which are called congruence classes, allow any cache line to be accessed in a fixed amount of time.
There are two general types of caches: direct mapped and associative. A direct mapped cache has only one location where a cache line may be stored. When a line maps to a location already having cached data, it displaces its predecessor. A direct-mapped cache is the simplest and fastest, but severely limits the number of cache locations where a particular line can reside. Thus, direct mapped cache performance can be severely degraded if frequent thrashing occurs.
An alternative to a direct-mapped cache is a set-associative cache. Set-Associative caches provide two or more locations in the cache where line having a given address may be stored. While such caches decrease the probability of thrashing, they are inherently slower in operation than direct-mapped caches because the cache logic must compare two or more lines to determine a hit.
In use, a direct-mapped cache can easily achieve single cycle latency, but causes more cache misses compared to a set associative cache with the same capacity and line size. A set associative cache, due to the late select caused by the added comparisons, usually requires more than one cycle of latency.
Intermediate schemes have been designed that attempt to improve the select time of set associative caches. Examples include the most recently used ("MRU") lookup scheme discussed in J. H. Chang, H. H. Chao, and K. So, "One-cycle cache design," IBM TDB 12-88, pp. 444-447 and "Cache Design of a Sub-Micron CMOS System/370," Proceedings, The 14. Ann. Int'l Symp. on Computer Architecture, June 1987, pp. 208-213, which are all hereby incorporated by reference. Another similar scheme is the content addressable memory ("CAM") scheme used in the POWERPC 620 11 instruction and data caches. These schemes speed up cache access by adding complicated tables and controls into the cache that attempt to guess the set identifier of every cache access which might hit the cache. However, the added logic necessary to implement these schemes limit the ability of the cache to achieve single cycle latency when the cycle time of the processor is in the range of only a few nanoseconds. In addition, the added logic increases the complexity and cost of the cache.
Therefore, there is a need in the art for a scheme enabling a set-associative cache to achieve a select time on par with a direct mapped cache without unduly increasing the complexity or cost of the cache.