1. Field of the Invention
The present invention relates to techniques for improving computer system performance. More specifically, the present invention relates to a method and apparatus that supports interleaved execution of a non-speculative thread and related speculative threads within a single processor pipeline.
2. Related Art
As microprocessor clock speeds continue to increase at an exponential rate, it is becoming progressively harder to design processor pipelines to keep pace with these higher clock speeds, because less time is available at each pipeline stage to perform required computational operations. In order to deal with this problem, some designer have begun to investigate the possibility of statically interleaving the execution of unrelated processor threads in round-robin fashion within a single processor pipeline. In this way, if N unrelated threads are interleaved, instructions for a given thread only appear once for every N consecutive pipeline stages. Hence, the N threads each run at 1/Nth of the native clock rate of the processor. For example, four threads, each running at three GHz, can collectively run on a 12 GHz processor.
This interleaving technique relaxes latency requirements, which makes it significantly easier to design a high-speed processor pipeline. For example, if four unrelated threads are interleaved, a data cache access (or an addition operation) can take up to four pipeline stages without adversely affecting the performance of a given thread.
Interleaving the execution of multiple threads within a single pipeline has a number of advantages. It saves power and area in comparison to executing the threads in separate pipelines. It also provides a large aggregate throughput for the single pipeline.
However, an application or benchmark that cannot be multi-threaded will not benefit from this interleaving technique. This is a problem because single-threaded performance is important to a large number of customers who buy computer systems. Consequently, benchmarks that customers use to compare computer system performance generally measure single-threaded performance.
Hence, what is needed is a method and an apparatus that provides the advantages of static time-multiplexed execution of multiple threads for a single-threaded application.