A modern computer system includes a microprocessor. The microprocessor reads and writes data from and to a memory in the system that is external to the microprocessor. Transfers of data between the microprocessor and memory are relatively slow compared to the speed at which the microprocessor can perform operations internally on the data. Consequently, the microprocessor may spend time idle waiting for data from the memory or waiting for data to be written to the memory, resulting in reduced performance.
To address this problem, modern microprocessors include one or more cache memories. A cache memory, or cache, is a memory internal to the microprocessor—typically much smaller than the system memory—that stores a subset of the data in the system memory. The cache stores data in cache lines. A cache line is the smallest unit of data than can be transferred between the cache and the system memory. A common cache line size is 32 bytes. When the microprocessor executes an instruction that references data, the microprocessor first checks to see if the cache line containing the data is present in the cache and is valid. If so, the instruction can be executed immediately since the data is already present in the cache. That is, in the case of a read, or load, the microprocessor does not have to wait while the data is fetched from the memory into the microprocessor. Similarly, in the case of a write, or store, the microprocessor can write the data to the cache and proceed on instead of having to wait until the data is written to memory.
The condition where the microprocessor detects that the cache line containing the data is present in the cache and valid is commonly referred to as a cache hit, or hit. The condition where the microprocessor detects that the cache line is not present or is invalid is commonly referred to as a cache miss, or miss.
When a cache miss occurs, the cache must notify other functional blocks within the microprocessor that the miss has occurred so that the missing cache line can be fetched into the cache. In a conventional cache, the cache does not immediately notify the other functional block that the miss has occurred in some cases. Instead, in some cases the cache retries the transaction that caused the miss. In a retry, the cache causes the transaction to re-arbitrate with other transactions for access to the cache and re-sequence through the cache pipeline.
Most caches have a high hit rate. It is not uncommon for caches to have greater than 90% hit rate, depending upon the data set involved. Consequently, if the cache delays in notifying the other functional blocks that a miss has occurred, the affect on performance is typically not great.
However, certain cache configurations can typically have much lower hit rates. For example, some microprocessors employ a hierarchical cache scheme of multiple caches, commonly referred to as a level-one (L1) cache and a level-two (L2) cache. The L1 cache is closer to the computation elements of the microprocessor than the L2 cache, and is capable of providing data to the computation elements faster than the L2 cache. Some L2 caches function as victim caches. With a victim cache configuration, when a cache line is discarded, or cast out, from the L1 cache, the cache line is written to the L2 cache rather than writing the cache line to system memory. The hit rate of some L2 victim caches, particularly where the size of the L2 cache is the same or smaller than the size of the L1 cache, has been observed to be approximately 50%.
As the hit rate of a cache decreases, the impact of the cache delaying to notify the other functional blocks that a miss has occurred may negatively impact performance. Therefore, what is needed is a cache that reduces the delay in notifying the other functional blocks that a miss has occurred.