1. Field of the Invention
This invention relates to microprocessors and, more particularly, to cache prefetch mechanisms.
2. Description of the Related Art
Memory latency is frequently a large factor in determining the performance (e.g. instructions executed per second) of a processor in a given computer system. Over time, the operating frequencies of processors have increased dramatically, while the latency for access to dynamic random access memory (DRAM) in the typical computer system has not decreased as dramatically. Additionally, transmitting memory requests from the processor to the memory controller coupled to the memory system also requires time, which increases the memory latency. Accordingly, the number of processor clocks required to access the DRAM memory has increased, from latencies (as measured in processor clocks) of a few processor clocks, through tens of processor clocks, to over a hundred processor clocks in modern computer systems.
Processors have implemented caches to combat the effects of memory latency on processor performance. Caches are relatively small, low latency memories incorporated into the processor or coupled nearby. The caches store recently used instructions and/or data under the assumption that the recently used information may be accessed by the processor again. The caches may thus reduce the effective memory latency experienced by a processor by providing frequently accessed information more rapidly than if the information had to be retrieved from the memory system in response to each access.
If processor memory requests (e.g. instruction fetches and load and store memory operations) are cache hits (the requested information is stored in the processor's cache), then the memory requests are not transmitted to the memory system. Accordingly, memory bandwidth may be freed for other uses. However, the first time a particular memory location is accessed, a cache miss occurs (since the requested information is stored in the cache after it has been accessed for the first time) and the information is transferred from the memory system to the processor (and may be stored in the cache). Additionally, since the caches are finite in size, information stored therein may be replaced by more recently accessed information. If the replaced information is accessed again, a cache miss will occur. The cache misses then experience the memory latency before the requested information arrives.
One way that the memory bandwidth may be effectively utilized is to predict the information that is to be accessed soon and to prefetch that information from the memory system into the cache. If the prediction is correct, the information may be a cache hit at the time of the actual request and thus the effective memory latency for actual requests may be decreased. Alternatively, the prefetch may be in progress at the time of the actual request, and thus the latency for the actual request may still be less than the memory latency even though a cache hit does not occur for the actual request. On the other hand, if the prediction is incorrect, the prefetched information may replace useful information in the cache, causing more cache misses to be experienced than if prefetching were not employed and thus increasing the effective memory latency. This is referred to as polluting the cache.