1. Field of the Invention
The present invention relates to the design of processors within computer systems. More specifically, the present invention relates to a method and an apparatus for buffering instructions from a processor pipeline during speculative execution in order to facilitate a fast restart after speculative execution without a pipeline filling delay.
2. Related Art
Recent increases in microprocessor clock speeds have not been matched by corresponding increases in memory access speeds. Hence, the disparity between microprocessor clock speeds and memory access speeds continues to grow. Execution profiles for fast microprocessor systems show that a large fraction of execution time is spent, not within the microprocessor core, but within memory structures outside of the microprocessor core. This means that microprocessors spend a large fraction of time stalled waiting for memory references to complete instead of performing computational operations.
As more processor cycles are required to perform a memory access, even processors that support “out-of order execution” are unable to effectively hide memory latency. Consequently, a processor will frequently stall waiting until requested data is returned from memory. Instead of waiting for stall condition to be resolved, it is possible to checkpoint the state of the processor and then speculatively execute instructions past the stall point in an attempt to prefetch subsequent loads (see related U.S. patent application Ser. No. 10/741,944 listed above). This technique can dramatically improve performance if the speculative execution is successful in prefetching subsequent loads.
However, this type of speculative execution can cause performance problems because the contents of the pipeline are overwritten during speculative execution. Consequently, when the stall condition is eventually resolved, and normal non-speculative execution recommences, the processor will first have to refill the pipeline again by fetching and decoding instructions immediately following stall point. Since pipeline stages are getting deeper as clock frequencies continue to increase, the effective latency associated with the fetch and decode stages is becoming longer. Because of this increased latency, execution units are forced to sit idle for more clock cycles following speculative execution. This wastes valuable cycles, and can thereby adversely affect processor performance. Hence, what is needed is a method and apparatus that allows the execution unit of the processor to do useful work while the initial instructions are fetched and decoded following after a restart after speculative execution.