Multithreaded processor design has recently been considered as an increasingly attractive option for increasing the performance of processors. Multithreading within a processor, inter alia, provides the potential for more effective utilization of various processor resources, and particularly for more effective utilization of the execution logic within a processor. Specifically, by feeding multiple threads to the execution logic of a processor, clock cycles that would otherwise have been idle due to a stall or other delay in the processing of a particular thread may be utilized to service a further thread. A stall in the processing of a particular thread may result from a number of occurrences within a processor pipeline. For example, a cache miss or a branch missprediction (i.e., a long-latency operation) for an instruction included within a thread typically results in the processing of the relevant thread stalling. The negative effect of long-latency operations on execution logic efficiencies is exacerbated by the recent increases in execution logic throughput that have outstripped advances in memory access and retrieval rates.
Multithreaded computer applications are also becoming increasingly common in view of the support provided to such multithreaded applications by a number of popular operating systems, such as the Windows NT® and Unix operating systems. Multithreaded computer applications are particularly efficient in the multi-media arena.
Multithreaded processors may broadly be classified into two categories (i.e., fine or coarse designs) according to the thread interleaving or switching scheme employed within the relevant processor. Fine multithreaded designs support multiple active threads within a processor and typically interleave two different threads on a cycle-by-cycle basis. Coarse multithreaded designs typically interleave the instructions of different threads on the occurrence of some long-latency event, such as a cache miss. A coarse multithreaded design is discussed in Eickemayer, R.; Johnson, R.; et al., “Evaluation of Multithreaded Uniprocessors for Commercial Application Environments”, The 23rd Annual International Symposium on Computer Architecture, pp. 203–212, May 1996. The distinctions between fine and coarse designs are further discussed in Laudon, J; Gupta, A, “Architectural and Implementation Tradeoffs in the Design of Multiple-Context Processors”, Multithreaded Computer Architectures: A Summary of the State of the Art, edited by R. A. Iannuci et al., pp. 167–200, Kluwer Academic Publishers, Norwell, Mass., 1994. Laudon further proposes an interleaving scheme that combines the cycle-by-cycle switching of a fine design with the full pipeline interlocks of a coarse design (or blocked scheme). To this end, Laudon proposes a “back off” instruction that makes a specific thread (or context) unavailable for a specific number of cycles. Such a “back off” instruction may be issued upon the occurrence of predetermined events, such as a cache miss. In this way, Laudon avoids having to perform an actual thread switch by simply making one of the threads unavailable.