The present invention relates to cache memories, and more particularly to strategies for selecting data to be replaced in a cache memory.
Processing inside a computer system may be performed by a hardware element called a central processing unit (CPU). Instructions and data for the CPU may be stored in a large, main memory. The operating speed of a CPU (i.e., the time it takes to perform one instruction) is typically very much faster than the access speed of the main memory. Consequently, the CPU may be forced to idly wait for a requested instruction or data item while the main memory cycles through one memory access operation. This idle wait time seriously degrades the effective processing speed of the CPU.
In order to address this problem, a cache memory unit is often designed into the computer system. Cache memories are well-known in the computer arts as being auxiliary memories that provide a buffering capability between a CPU and a main memory. The cache memory is typically designed to run much faster than the main memory, and to be loaded out of the main memory.
Memory devices that run at the speed of the CPU are much more expensive and physically larger than the slower devices that make up the main memory. As a result, the size of a cache memory (as measured by the number of separately addressable storage cells contained within the memory) is much smaller than the size of a main memory. Because the cache memory cannot contain all of the instructions and data stored in the main memory, the CPU occasionally requests a particular instruction or data item that is not presently stored in the cache memory. Such an occurrence is called a "cache miss", and requires that the requested instruction or data item be retrieved from main memory, stored into the cache memory, and then supplied to the CPU. It can be seen, then, that each cache miss has the potential for making the CPU wait for as long (if not longer) than it would if the cache memory were not present.
A technique for reducing the processing speed penalty whenever a data read cache miss occurs is to make instruction execution out-of-order. This means that instructions subsequent to the one that caused the cache miss will continue to execute while the CPU is waiting for the missing data. For this strategy to work, it is necessary that execution of these subsequent instructions not be dependent on the missing data. Execution of instructions that do depend on the missing data must be held in abeyance (e.g., in queues) until the missing data becomes available. When the data does become available, all of the instructions that were dependent on this data are then executed. Out-of-order instruction execution techniques are described in William Johnson, Superscaler Microprocessor Design, 1991 (ISBN 0-13-875634-1) which is incorporated herein by reference.
Even if the out-of-order execution strategy is adopted, there will likely be branch instructions in the program whose target location is in some way conditional on the missing data. One strategy for avoiding a long delay in the instruction fetching operation of the CPU is to guess which branch will be taken, and to tentatively continue fetching and executing instructions from the guessed branch. If, when the missing data becomes available, it is found that the guess was correct, then the results of the tentative execution can be made permanent (e.g., by storing results into target memory locations). However, if an incorrect guess was made, then all of the results from instructions executed after the conditional branch instruction must be flushed, and program execution restarted from the correct branch path. A wrong guess, therefore, causes a very high performance penalty.
This strategy can be improved by further including a branch prediction memory that stores statistics on the results of previous conditional branches in order to increase the probability of making a correct guess regarding which is the correct path for a pending conditional branch. Notwithstanding the use of this strategy, there will inevitably be branches that are mispredicted, thereby causing a high performance penalty.
Another factor that influences the effective execution speed of the CPU is the fact that when a cache miss occurs, data (or one or more instructions for the case where data and instructions share the same cache memory) must be removed from the cache memory in order to make room for the missing data item. The strategy for selecting data to be removed from the cache (called a "cache replacement strategy") can also influence the effective execution speed of the CPU because the "cast out" data may be needed at a later time, thereby causing another cache miss.
Existing cache replacement strategies have been based on maximizing the probability that a requested instruction or data item will be successfully located in the cache (called a "cache hit"). One such strategy selects for removal that data item that has been least recently used (LRU) by the executing program. The basis for this approach is the concept of temporal locality: the notion that the probability that the next address to be accessed will be the same as a recently accessed address is higher the sooner the second access occurs with respect to the first.
Other cache replacement strategies are random replacement and first-in-first-out (FIFO).
All of the above cache replacement strategies have as a goal a high cache hit ratio, usually defined as the number of times an attempted cache read is successful at obtaining the data from the cache divided by the total number of attempted cache accesses. (A related measure is the cache miss ratio, usually defined as 1--cache hit ratio.) However, these cache replacement strategies are deficient because they fail to take into account the effects of cache misses, which will inevitably occur.