Modern computer systems include a microprocessor and a system memory for storing instructions to be executed by the microprocessor and data to be processed by the instructions. The time required to read data from the system memory is typically very large relative to the time the microprocessor spends executing one or more instructions to process the data—in some cases one or two orders of magnitude. Consequently, the processor may sit idle while the data is loaded from the system memory, which is very inefficient and degrades system performance.
To alleviate this problem, microprocessors include a cache memory. A cache memory is a memory within the processor smaller than the system memory that stores a subset of the system memory data. When the processor executes an instruction that references data, the processor first checks to see if the data is present in the cache, commonly referred to as a “cache hit,” from a previous load of the data. If the load hits in the cache, then the instruction can be executed immediately. Otherwise, if the load “misses” the cache, the instruction must wait while the data is fetched from the system memory into the processor.
Microprocessor designers have recognized that software programs frequently access data and instructions sequentially. Hence, if a load misses in the cache, it is highly likely that the data at the memory addresses following the load miss address will be requested by the program in the near future. Consequently, a microprocessor may speculatively begin loading the next chunk of data after the missing data into the cache, even though the program has not yet requested the next data, in anticipation of a future need for the next chunk of data. This is commonly referred to as a prefetch.
The chunk of data prefetched is commonly the size of a cache line. Caches store data in cache lines. Common cache line sizes are 32 bytes or 64 bytes. A cache line is the smallest unit of data that can be transferred between the cache and the system memory. That is, when a microprocessor wants to read a cacheable piece of data missing in the cache, it reads from memory the entire cache line containing the missing piece of data and stores the entire cache line in the cache. Similarly, when a new cache line needs to be written to the cache that causes a modified cache line to be replaced, the microprocessor writes the entire replaced line to memory.
The conventional approach is to treat the prefetched cache line as an ordinary line fill. An ordinary line fill is a fetch of a cache line from system memory because an instruction accessed data in the cache line. With an ordinary line fill, the fetched cache line is unconditionally written, or retired, into the cache. A disadvantage of unconditionally retiring a speculatively prefetched cache line into the cache is that it potentially replaces a line in the cache that is currently being used or likely to be used in the near future, thereby potentially adversely affecting cache efficiency. A solution to this problem is needed in order to improve cache efficiency.