1. Field of the Invention
This invention is related to the field of processors and, more particularly, to prefetching in processors.
2. Description of the Related Art
Memory latency is frequently a large factor in determining the performance (e.g. instructions executed per second) of a processor in a given system. Over time, the operating frequencies of processors have increased dramatically, while the latency for access to dynamic random access memory (DRAM) in the typical system has not decreased as dramatically. Accordingly, the number of processor clocks required to access the external memory has increased, from latencies (as measured in processor clocks) of a few processor clocks, through tens of processor clocks, to over a hundred processor clocks in modern systems.
Processors have implemented caches to combat the effects of memory latency on processor performance. Caches are relatively small, low latency memories incorporated into the processor or coupled nearby. The caches store recently used instructions and/or data under the assumption that the recently used information may be accessed by the processor again. The caches thus reduce the effective memory latency experienced by a processor by providing frequently accessed information more rapidly than if the information had to be retrieved from the memory system in response to each access.
If processor memory requests (e.g. instruction fetches and load/store memory operations) are cache hits (the requested information is stored in the processor's cache), then the memory requests are not transmitted to the memory system. Accordingly, memory bandwidth may be freed for other uses. However, the first time a particular memory location is accessed, a cache miss occurs (since the requested information is stored in the cache after it has been accessed for the first time) and the information is transferred from the memory system to the processor (and may be stored in the cache). Additionally, since the caches are finite in size, information stored therein may be replaced by more recently accessed information. If the replaced information is accessed again, a cache miss will occur. The cache misses then experience the memory latency before the requested information arrives.
One way that the memory bandwidth may be effectively utilized is to predict the information that is to be accessed soon and to prefetch that information from the memory system into the cache. If the prediction is correct, the information may be a cache hit at the time of the actual request and thus the effective memory latency for actual requests may be decreased. Alternatively, the prefetch may be in progress at the time of the actual request, and thus the latency for the actual request may still be less than the memory latency even though a cache hit does not occur for the actual request. On the other hand, if the prediction is incorrect, the prefetched information may replace useful information in the cache, causing more cache misses to be experienced than if prefetching were not employed and thus increasing the effective memory latency.
Also, many instruction set architectures (ISAs) support prefetch instructions designed to permit software to prefetch data that it expects will be used in the near future. Processors often treat such instruction as loads. The prefetch instructions consume memory bandwidth, and can conflict with hardware-controlled prefetching, reducing overall performance.