1. Field of the Invention
The present invention relates to a pipeline in a central processing unit (CPU). More particularly, the present invention relates to a method for preloading data in a CPU pipeline.
2. Description of the Related Art
As the CPU gets faster and faster and the pipeline gets deeper and deeper, the speed of the memory relative to the CPU gets slower and slower. As a result, when loading data from a memory, the CPU has to wait more and more cycles. Such memory load latency stalls the pipeline and decreases pipeline throughput. In some cases, for example in instruction loops, the CPU is idle in 30% of the time because of memory load latency. The waste of CPU resources is very undesirable.
A conventional solution to this problem is hiding memory load latency. The hiding can be implemented by software or hardware.
The software approach is unroll-and-jamming the loop, and try to hide the latency by rescheduling the core loop. However, unroll-and-jamming the loop increases its foot print in the instruction cache, occupying precious cache space. Rescheduling the core loop means moving load instructions forward to hide the memory load latency. The moved load instructions need registers to store the loaded data. Sometimes the registers in the CPU are not enough to store all the loaded data. Moreover, not all loops are suitable for unroll-and-jamming.
The hardware approach uses specialized hardware to perform data speculation based on addressing patterns of load instructions. However, the cost of speculation hardware is still high. The speculation is not accurate enough. Besides, the speculation hardware cannot handle general instruction loops.