1. Technical Field of the Invention
This invention generally relates to set associative caches for computer systems, and more particularly to reducing the latency penalty for cache access.
2. Background Art
The use of caches for performance improvements in computing systems is well known and extensively used. See, for example, U.S. Pat. No. 5,418,922 by L. Liu for "History Table for Set Prediction for Accessing a Set Associative Cache", and U.S. Pat. No. 5,392,410 by L. Liu for "History Table for Prediction of Virtual Address Translation for Cache Access", the teachings of both of which are incorporated herein by reference.
A cache is a high speed buffer which holds recently used memory data. Due to the locality of references nature for programs, most of the access of data may be accomplished in a cache, in which case slower accessing to bulk memory can be avoided.
In typical high performance processor designs, the cache access path forms a critical path. That is, the cycle time of the processor is affected by how fast cache accessing can be carried out.
In order to achieve increased performance, microprocessors are being designed with ever-faster clock rates. Keeping the microprocessor supplied with instructions and data from memory becomes more difficult as processor speeds increase, and it is becoming more common to implement Level 2 (L2) caches using SRAMS operatively coupled to the microprocessor. The least expensive SRAMS are the industry-standard, commodity-priced modules which are typically 64K.times.18-bit or 256K.times.18-bit devices. Several of these SRAMs are usually used in parallel to create an external L2 cache. An example of a pipelined SRAM is the IBM 32K.times.36 & 64K.times.18 SRAM 03H9040, described in IBM publication SA 14-4659-03, revised 7/96 at page 3 of 21.
Until now, the vast majority of these L2 caches have been direct-mapped, or 1-way associative, due to the simplicity of such a design, and more importantly, due to the fact that the limited number of signal pins on a typical microprocessor makes it difficult to implement a multi-way associative cache using standard SRAMS. There is, therefore, a need in the art for a circuit design which enables a multiway off-chip cache to be implemented with standard SRAMS.
There are two common ways to implement a multiway cache.
The first way is to implement the cache as a set of caches operating in parallel, with the desired data being obtained from one of the caches based on information obtained from a directory which is usually accessed at the same time as the cache. A two-way cache, for example, would be implemented-with two parallel arrays, and the output of one array would be selected based on matching an entry in one of the directories associated with the cache. This method usually results in the best performance, because the cache and directory accesses are done at the same time, resulting in the minimum latency for obtaining the desired data. The major disadvantage, especially as it relates to microprocessor external caches, is that a data bus from each array must be connected to the microprocessor, unless some sort of external multiplexer (which increases latency and adds cost) is used.
A second approach for implementing a multiway cache is to use a single array, and partition it to contain the various cache sets, or slots as they are sometimes called. However, this usually means that the directory must be searched before the array access can begin, because the slot must be known in order to generate the array address bit(s) which correspond(s) to the desired cache slot. The advantage of this method is that only one data bus need be connected to the SRAMs to access data. The disadvantage is that access latency is increased because the directory must be searched before beginning the cache access.
It is, therefore, an object of the invention to avoid increased access latency in multi-way cache accessing due to the need to search a directory before beginning the cache access.