1. Technical Field
The present invention relates generally to data processing systems and more particularly to fetching data for utilization during data processing. Still more particularly, the present invention relates to data prefetching operations in a data processing system.
2. Description of Related Art
Conventional computer systems are designed with a memory hierarchy comprising different memory devices with increasing access latency the further the device is away from the processor. The processors typically operate at a very high speed and are capable of executing instructions at such a fast rate that it is necessary to prefetch a sufficient number of cache lines of data from lower level cache (and/or system memory) to avoid the long latencies when a cache miss occurs. Thus, prefetching provides an effective way to hide ever increasing memory latency from the execution engine. Prefetching ensures that the data is ready and available when needed for utilization by the processor.
Conventional hardware-based prefetch operations involve a prefetch engine that monitors accesses to the L1 cache and, based on the observed patterns, issues requests for data that are likely to be referenced in the future. If the prefetch request succeeds, the processor's request for data will be resolved by loading the data from the L1 cache on demand, rather than the processor stalling while waiting for the data to be fetched/returned from lower level memory.
Typically, when prefetching data, the prefetch engines utilize some set sequence and a stride pattern to identify a stream of cache lines to be fetched. A “prefetch stream” may refer to a sequence of memory addresses (and specifically the associated data blocks), whose data are prefetched into the cache using the detected prefetch pattern.
To increase the memory-level parallelism and eventually exploit instruction-level parallelism, a prefetch engine is typically capable of detecting multiple concurrent streams, and the prefetch engine issues multiple prefetch requests at once to overlap the long fetch latency of prefetch requests. Different prefetch requests have different impact on the overall performance. However, conventional prefetch engines normally issue the prefetch requests in a fixed order, which makes the prefetch engine unable to maximize the performance potential of prefetch requests.
Though many techniques have been proposed to improve prefetch accuracy, there has been little work on how to schedule prefetch requests in an optimal way. One proposed approach involves using the compiler to detect prefetch requests on the critical path and assign these prefetch requests with a higher priority than other prefetch requests that are not on the critical path. This approach is limited to software-based prefetch mechanisms and uses only two different priority levels (i.e., critical and non-critical).
As the speed gap between the processor and the memory increases, prefetch requests must be issued farther ahead to cover the increasing memory latency. However, prefetching farther ahead may potentially bring in useless data into the processor caches and pollute the processor caches. The processor-speed gap makes it possible that the processor issues memory requests in a rate that is faster than can be handled by the memory system.