1. Field of the Invention
The present invention relates to techniques for improving computer system performance. More specifically, the present invention relates to a method and an apparatus for implementing simultaneous multi-threading (SMT) in an asynchronous manner to improve single-threaded performance within a computer system.
2. Related Art
As microprocessor clock speeds continue to increase at an exponential rate, it is becoming progressively harder to design processor pipelines to keep pace with these higher clock speeds, because less time is available at each pipeline stage to perform required computational operations. In order to deal with this problem, some designers have begun to investigate simultaneous multithreading (SMT) techniques that operating by interleaving the execution of unrelated processor threads (for example, in round-robin fashion) within a single processor pipeline. In this way, if N unrelated threads are interleaved, instructions for a given thread only appear once for every N consecutive pipeline stages. Hence, the N threads each run at 1/Nth of the native clock rate of the processor. For example, four threads, each running at three GHz, can collectively run on a 12 GHz processor.
SMT relaxes latency requirements, which makes it significantly easier to design a high-speed processor pipeline. For example, if four unrelated threads are interleaved, a data cache access (or an addition operation) can take up to four pipeline stages without adversely affecting the performance of a given thread.
Interleaving the execution of multiple threads within a single pipeline has other advantages. It saves power and area in comparison to executing the threads in separate pipelines. It also provides a large aggregate throughput for the single pipeline.
However, SMT does not improve performance for single-threaded applications. Single-threaded performance is important for a general-purpose processor because some applications inherently rely on single-threaded execution. Additionally, while legacy applications can benefit from single-threaded performance improvements, they cannot readily benefit from thread-level parallelism (TLP) improvements.
Hence, what is needed is a method and an apparatus for improving the performance of single-threaded applications in a computer system that supports simultaneous multi-threading.