1. Field of the Invention
This invention generally relates to processors and more particularly, to using multi-threaded virtual state mechanism in multi-threaded processors.
2. Description of Related Art
Typically, a hyperthreaded or multi-threaded processor is capable of processing multiple instruction sequences concurrently. A primary motivating factor driving execution of multiple instruction streams within a single processor is the resulting improvement in processor utilization. Multi-threaded processors allow multiple instruction streams to execute concurrently in different execution resources in an attempt to better utilize those resources. Furthermore, multi-threaded processors can be used for programs that encounter high latency delays or which often wait for events to occur.
Typically, although two or more threads may executed concurrently on the same hardware, each thread maintains its own architectural state, and the state is referenced by the executing hardware depending on which thread is active at that particular time in a given pipestage having a latch and a multiplexer, commonly referred to as the “latch and mux” paradigm or mechanism. Using the conventional latch and mux paradigm, the hardware for a single-threaded processor may be expanded to handle two or more threads by adding latches for the state of the second thread along with a multiplexer to select which thread's state is needed to be accessed in any given cycle. However, the conventional latch and mux paradigm may be fairly complex and is often the source of timing problems in critical speedpaths on the processor. For example, timing problems can arise with the conventional latch and mux mechanism in sections of logic that contain tight feedback loops which continually update the architectural state based on the previous value of that state. The problems can be further compounded when this architectural state has the need to be restored due to, for example, mis-speculation, such as a branch misprediction.
FIG. 1 is a block diagram illustrating a conventional prior art multi-threading functionality. As illustrated, a multi-threaded processor 100 may include multiple threads, such as thread 0 102, thread 1 104, thread 2 106, and thread 3 108. A current thread multiplexer (CT multiplexer) 110 may be used to detect which thread of the threads 102-108 is active in a particular state of the pipeline. Typically, the size of the CT multiplexer 110 may be directly proportional to the number of threads, for example, as illustrated here, four threads 0-3 102-108 may require a 4:1 CT multiplexer 110.
Using stack pointer logic as an example, although the stack itself and the update process to update the top-of-stack may be shared by all threads 0-3 102-108, the stack pointer may still have a different and separate value corresponding to the active thread, such as thread 0 102, of the threads 0-3 102-108. The CT multiplexer 110 may be used to choose the active thread 0 102 and forward the information regarding the active thread 0 102 to logic to process register stack reference 112 and logic to update top-of-stack (TOS) 114. The result of the logic to update TOS 114 may then be looped back as a feed back loop 116 to update the thread that was active by writing the result into the thread 0 102 TOS. Typically, thread TOS may be updated using the logic to update TOS 114 to reflect the TOS changes indicated by instructions as pushes and pops. The updating of the thread TOS may then be used by the next group of instructions to be processed at the next cycle. However, such TOS updates may have to happen every cycle requiring a 1-cycle feedback loop 116 to update the thread TOS, and due to the limitations of the clock speed and processor logic, the CT multiplexer 110 may be required to perform the selection process of the active thread of the threads 0-3 102-108 per cycle. Requiring such a selection process task from the CT multiplexer 110 for every cycle before forwarding active thread state information to logic 112-114 may significantly slow down the processor 100.
Furthermore, the update logic, such as logic to update TOS 114, may have to be expanded to update and/or access the TOS of any of the threads 0-3 102-108, depending on which thread of the threads 0-3 102-108 was active in a given cycle. Each thread 0-3 102-108 may also require logic to provide for state restoration (SR) 118-124 using SR multiplexers 126-132, should the TOS be corrupted due to an occurrence or event, such as a mispredicted branch. Conventional methods, apparatus, and systems require all components, such as the CT multiplexer 110 and SR multiplexers 118-124, to remain a part of the critical loop, such as the feedback loop 116, resulting in further lowering of the machine frequency.