Computer systems employ cache memories because their access latency is significantly less than the access latency of main memory. These cache memories retain recently accessed data, in the hope that this data will be accessed again in the future. Memory operations performed by the processor access this cache memory first; in the event that the accessed data is not in the cache (termed a cache miss), the processor must wait for an extended period of time while that data is loaded into the cache from a more remote memory. Processor stalls caused by this wait period can account for the majority of execution time for many applications. Consequently, reducing the frequency of these cache misses can result in significant performance improvement.
Cache memories are logically organized as multiple sets of cache blocks. When a cache miss occurs, the set in which the new block is placed is first determined. If that set is full, room must be created for the new block by evicting one of the currently residing blocks from the set. This block is termed the victim. There has been much prior work described in the literature on determining the best choice of victim, such that the cache miss rate will be minimized. Examples of such cache block replacement policies include least-recently used (LRU) and first-in-first out (FIFO). These replacement policies have been designed to minimize the frequency of misses to the cache, regardless of whether those misses were caused by load or store instructions.
Computer systems sometimes employ write buffers to temporarily buffer data written by a processor, so that in the event of a cache miss to the memory referenced by a store instruction, the processor may continue to execute instructions without stalling until the cache miss completes. Unlike store misses, a processor must wait on load misses to complete, because subsequent instructions that are dependent upon the data returned by the cache miss cannot execute until the data is available. Consequently, the performance cost of a load miss is generally larger than the performance cost of a store miss.
Existing cache block replacement methods do not account for this discrepancy between miss cost, resulting in replacement policies that minimize all misses, regardless of whether those misses are loads or stores. Replacement policies that minimize load misses (at the expense of increased store misses) may increase overall performance, given sufficient store buffering resources.
Therefore, there is a need for a cache block replacement method to overcome the stated shortcomings of the prior art.