1. Technical Field
This invention generally relates to data processing, and more specifically relates to switching between threads in a multithreaded processor.
2. Background Art
In modern computer systems, multithreading has been used to keep high frequency processors from being idle a majority of the time. In general, this is accomplished by allowing multiple threads to execute at once on a single physical processor. In a two-threaded system, when a first thread stalls (e.g., after encountering a cache miss), the context is changed to the second thread, and execution of the second thread continues.
Different types of multithreading are known in the art. Hardware multithreading, also known as coarse-grain multithreading, allows only one thread to issue instructions at one time. Due to the presence of multiple threads, the effect of cache miss latencies may be minimized by performing a thread switch whenever a cache miss occurs. However, because there is only a single instruction pipeline, hardware multithreading does not benefit from any overlapping latencies in the instruction pipeline. Simultaneous multithreading, also known as fine-grain multithreading, allows multiple threads to issue instructions at one time. Simultaneous multithreading requires separate resources for each active thread. Each thread typically has its own instruction buffer, register file, etc. As a result, simultaneous multithreading improves not only cache miss latencies, but also provides overlapping latencies in the different instruction pipelines for each thread. Note, however, that this increased performance comes at a significant cost in hardware due to the separate resources that are required for each thread. Providing two threads in a simultaneously multithreaded processor is relatively straightforward. Two sets of general purpose registers are provided, two sets of instruction buffers are provided, etc. When execution of one thread stalls, the other thread is executed. However, providing more than two threads significantly complicates a processor with simultaneous multithreading. If there are four threads, for example, four sets of general purpose registers, four instruction buffers, etc. are required. It is an extremely complicated problem to simultaneously issue instructions from three or more threads, and this also would require several additional pipeline issue stages. When execution of one thread stalls, how is it decided which of the three other threads should now execute? The answer is unclear, and complex to implement in hardware. As a result, there have been limited efforts in the prior art to extend simultaneous multithreading beyond two threads.
A prior art processor 100 that has two threads in a simultaneous multithreading configuration is shown in FIG. 1. Each thread 110, 120 has its own instruction buffer 112, 122, respectively. The issue/dispatch logic 150 receives instructions from the instruction buffers 112 and 122 via respective access selectors 130 and 140, and issues the instructions to a plurality of functional units 160. If one of the threads 110, 120 stalls, execution of the non-stalled thread may hopefully continue.
Threads 110 and 120 are simultaneously multi-threaded, which means that each of these threads preferably has its own instruction buffer and register state. Issue/dispatch logic 150 may thus issue instructions from both threads 110 and 120 at the same time to the functional units 160.
As the clock frequency of modern processors increases, cache and memory latencies are becoming longer relative to the processor cycle. As a result, in a typical simultaneous two-threaded system as shown in FIG. 1, there is just too much time when both threads are stalled. New multithreading schemes have been proposed with four or more threads extent at one time. Implementing more simultaneous threads can theoretically provide more gains by overlapping the latencies. However, as discussed above, adding additional simultaneous threads greatly adds to the complexity of the design. In addition, the number of required registers is proportional to the number of simultaneous threads. As a result, known simultaneous multithreading techniques make handling more than two simultaneous threads very difficult and costly. Without an improved way for multithreading that supports more than two threads, the computer industry will continue to suffer from excessively expensive ways of providing more than two threads of execution in a processor.