1. Technical Field
The present invention relates generally to data processing and more particularly to fetching data for utilization during data processing. Still more particularly, the present invention relates to data prefetching operations during data processing.
2. Description of Related Art
Prefetching of data for utilization within data processing operations is well-known in the art. Conventional computer systems are designed with a memory hierarchy comprising different memory devices with increasing access latency the further the device is away form the processor. These conventionally designed processors typically operate at a very high speed and are capable of processing data at such a fast rate that it is necessary to prefetch a sufficient number of cache lines of data from lower level (and/or system memory). This prefetching ensures that the data is ready and available for utilization by the processor.
Data prefetching is a proven, effective way to hide increasing memory latency from the processor's execution units. On these processors, data prefetch requests are issued as early as possible in order to “hide” the memory access latencies and thus allow ensuing dependent data operations (load requests) to execute with minimal delay in the return of data to the execution units.
If a data prefetch operation does not complete by the time the processor demands (i.e., issues a load request for) the corresponding data/cache line, the processor operations may stall as the processor waits for the data to be fetched from lower level memory. If, on the other hand, the data prefetch completes long before the processor requires the data, a window of vulnerability (between the prefetch completion and the demand for the data/cache line) exists, during which time the prefetched cache line may be replaced in the cache/prefetch buffer before the fetched cache line is demanded by the processor. This results in a stall of processor operations as the cache line has to be refetched when the demand is eventually issued.
How far ahead a prefetch request is issued is called “prefetch distance”. A majority of existing/conventional prefetch algorithms utilize a fixed prefetch distance to decide when to issue data prefetches. However, in processor execution/operations, the timeliness of a data prefetch depends on not only this static prefetch distance, but also on two dynamic factors, namely (1) how soon the prefetch operation completes with current operating conditions and (2) how fast the processor consumes data. As a result, using a static prefetch distance typically will not exploit the full benefits to be gained from data prefetching and leads to the above described inefficiencies.
Researchers in the industry have proposed to augment each cache line with extra states and extra bits to dynamically increase the prefetch distance. These proposed mechanisms all share the drawbacks of introducing significant hardware overhead and not being able to detect “too early” prefetch requests. These drawbacks have prevented/hampered the proposed mechanisms from being integrated in real systems. Thus, while significant research has gone into improving the coverage and accuracy of data prefetch algorithms, to date there has been little success in providing a data prefetch mechanism/method within a processing system that produces “just-in-time” prefetches.