1. Technical Field of the Invention
This invention generally relates to set associative caches for computer systems, and more particularly to responding to MRU misses and cache misses.
2. Background Art
The use of caches for performance improvements in computing systems is well known and extensively used. See, for example, U.S. Pat. No. 5,418,922 by L. Liu for "History Table for Set Prediction for Accessing a Set Associative Cache", and U.S. Pat. No. 5,392,410 by L. Liu for "History Table for Prediction of Virtual Address Translation for Cache Access", the teachings of both of which are incorporated herein by reference.
A cache is a high speed buffer which holds recently used memory data. Due to the locality of references nature for programs, most of the access of data may be accomplished in a cache, in which case slower accessing to bulk memory can be avoided.
In typical high performance processor designs, the cache access path forms a critical path. That is, the cycle time of the processor is affected by how fast cache accessing can be carried out.
Referring to FIG. 7, set associative or multi way cache designs are a common approach for implementing high performance access to storage. Read address registers 150, 160 are connected by lines 151, 161, respectively to arrays 152, 162, the outputs of which appear on lines 153, 163. Typically, an N-way cache has N arrays 152, 162 which are accessed in parallel along with N directory address entries which determine which array has the data of interest via address compare logic. Each of arrays 152, 162 comprises one or more array macros, the number of such macros included in each array 152, 162 being sufficient to provide the desired number of data bits.
Referring to FIG. 8, determination of which array has the data of interest occurs late in the same cycle as the access of the multiple cache data arrays 152, 162. This determination is often late enough that it is not useful to gate correct data in the same cycle. A useful alternative is to guess at which array contains the correct data and gate its output accordingly. An MRU array is commonly employed to perform this function. It is referred to here as a "slot" MRU array, the term "slot" representing one of the four sets of a four way set associative cache. The correctness of the guess is determined by comparing it with the true indication from the address compares during cycle N+1 181. If the guess is wrong, a stall cycle occurs, and the correct array data is sent in the following cycle N+2 182. This correct cache array data must be available in the following cycle 182 in order to limit the stall penalty for an incorrect guess to one cycle.
Referring again to FIG. 7, traditionally one of the following two alternatives is employed to make the correct cache array data available in the cycle following a wrong MRU guess. The first is to provide a hold signal to the read address registers 150, 160 of the cache arrays 152, 162, respectively, which hold signal is activated directly by comparing a guessed address with a true address. The second is to latch all cache array outputs 153, 163 at the end of the access cycle n+1 181 in data out registers 154, 164, respectively, and gate the appropriate register 154, 164 in the following cycle n+2 182.
The first implementation creates a critical path problem which limits the processor cycle time. The second implementation alleviates the critical path problem but at the expense of significant additional circuitry for the data registers 154, 164 and associated multiplexing. For instance, in a 4-way cache with a 16-byte data flow, it would add 432 latches to hold the data and paity in excess of the 144 latches needed to hold a selected output.
A similar problem exists with certain store-in or write-back cache designs. A store-in cache requires moving modified data out of the cache and storing it to another level of cache or main memory when the cache line replacement algorithm chooses a modified cache line for replacement (a cache line is a block of data that is tansferred as a unit between different levels of a cache or memory.) Typically, the movement of modified data out of cache (called a castout operation) is overlapped with the request of the new line (called a fill operation) from a lower level of the cache/memory subsystem. The castout operation should occur as quickly as possible so that as little interference as possible occurs with the delivery of new data for the fill.
Referring to FIG. 9, one means of minimizing the number of cycles to complete the castout is to save the data read during the original cache access cycle n+2 192. Another means is to hold the address registers in cycle 191 using a late indication of a cache miss requiring a castout. These solutions are subject to the same limitations described for the slot MRU wrong guess alternative.
A similar problem exists when a cache miss occurs and the slot that needs to be replaced has modified data and must be castout (or copied back) from the cache. The least recently used (LRU) slot is selected for the castout, but the most recently used (MRU) slot was used in the initial access of cache. Therefore the data selected in the initial access is not the correct data to store into the copy back data register (also referred to as the unaligned data register.)
Consequently, it is an object of the invention to provide a system for recovering from a slot MRU miss that avoids the need to reaccess the cache and that does not require the use of additional data out registers external to the array macro.
It is a further object of the invention to provide a system for completing a castout operation that avoids the need to reaccess the cache and that does not require the use of additional data out registers external to the array macro.