1. Field of the Invention
The present invention relates generally to enhancing performance of processors, and more particularly to methods for enhancing memory-level parallelism (MLP) and tolerating latency to reduce the overall time the processor spends waiting for data to be loaded.
2. Description of Related Art
To enhance the performance of modern processors, various techniques are used to enhance the number of instructions executed in a given time period. One of these techniques is prefetching data that the processor needs in the future.
Prefetching data, in general, refers to mechanisms that predict data that will be needed in the near future and issuing transactions to bring that data as close to the processor as possible. Bringing data closer to the processor reduces the latency to access that data when, and if, the data is needed.
Many forms of data prefetching have been proposed to increase memory-level parallelism (MLP). One form of data prefetching uses hardware mechanisms that prefetch data based on various heuristics. Another form of data prefetching uses traditional software prefetches where prefetch instructions are inserted in the instruction stream to initiate the data prefetching.
Most instruction set architectures have a prefetch instruction that lets the software inform the hardware that the software is likely to need data at a given location, specified in the instruction, in the near future. Hardware then responds to these prefetch instructions by potentially moving data to close caches in the processor.
To use prefetch instructions, software must also include code sequences to compute addresses. These code sequences add an overhead to the overall execution of the program as well as requiring the allocation of some hardware resources, such as registers, to be dedicated to the prefetch work for periods of time. The potential benefit of data prefetching to reduce the time the processor spends waiting for data can compensate for the overhead of data prefetching, but not always. This is especially complicated because software has at best imperfect knowledge ahead of time of what data will already be close to the processor and what data needs to be prefetched. For dynamic instances of a load that does not miss the cache, the overhead associated with the prefetch instruction likely degrades performance.