The present invention relates generally to the field of multithreaded processors and, more specifically, to a method and apparatus for processing an event occurrence within a multithreaded (MT) processor.
Multithreaded (MT) processor design has recently been considered as an increasingly attractive option for increasing the performance of processors. Multithreading within a processor, inter alia, provides the potential for more effective utilization of various processor resources, and particularly for more effective utilization of the execution logic within a processor. Specifically, by feeding multiple threads to the execution logic of a processor, clock cycles that would otherwise have been idle due to a stall or other delay in the processing of a particular thread may be utilized to service a further thread. A stall in the processing of a particular thread may result from a number of occurrences within a processor pipeline. For example, a cache miss or a branch misprediction (i.e., a long-latency operation) for an instruction included within a thread typically results in the processing of the relevant thread stalling. The negative effect of long-latency operations on execution logic efficiencies is exacerbated by the recent increases in execution logic throughput that have outstripped advances in memory access and retrieval rates.
Multithreaded computer applications are also becoming increasingly common in view of the support provided to such multithreaded applications by a number of popular operating systems, such as the Windows NT(copyright) and Unix operating systems. Multithreaded computer applications are particularly efficient in the multi-media arena.
Multithreaded processors may broadly be classified into two categories (i.e., fine or coarse designs) according to the thread interleaving or switching scheme employed within the relevant processor. Fine multithreaded designs support multiple active threads within a processor and typically interleave two different threads on a cycle-by-cycle basis. Coarse multithreaded designs typically interleave the instructions of different threads on the occurrence of some long-latency event, such as a cache miss. A coarse multithreaded design is discussed in Eickemayer, R.; Johnson, R.; et al., xe2x80x9cEvaluation of Multithreaded Uniprocessors for Commercial Application Environmentsxe2x80x9d, The 23rd Annual International Symposium on Computer Architecture, pp. 203-212, May 1996. The distinctions between fine and coarse designs are further discussed in Laudon, J; Gupta, A, xe2x80x9cArchitectural and Implementation Tradeoffs in the Design of Multiple-Context Processorsxe2x80x9d, Multithreaded Computer Architectures: A Summary of the State of the Art, edited by R. A. Iannuci et al., pp. 167-200, Kiuwer Academic Publishers, Norwell, Mass., 1994. Laudon further proposes an interleaving scheme that combines the cycle-by-cycle switching of a fine design with the full pipeline interlocks of a coarse design (or blocked scheme). To this end, Laudon proposes a xe2x80x9cback offxe2x80x9d instruction that makes a specific thread (or context) unavailable for a specific number of cycles. Such a xe2x80x9cback offxe2x80x9d instruction may be issued upon the occurrence of predetermined events, such as a cache miss. In this way, Laudon avoids having to perform an actual thread switch by simply making one of the threads unavailable.
A multithreaded architecture for a processor presents a number of further challenges in the context of an out-of-order, speculative execution processor architecture. More specifically, the handling of events (e.g., branch instructions, exceptions or interrupts) that may result in an unexpected change in the flow of an instruction stream is complicated when multiple threads are considered. In a processor where resource sharing between multiple threads is implemented (i.e., there is limited or no duplication of functional units for each thread supported by the processor), the handling of event occurrences pertaining to a specific thread is complicated in that further threads must be considered in the handling of such events.
Where resource sharing is implemented within a multithreaded processor it is further desirable to attempt increased utilization of the shared resources responsive to changes in the state of threads being serviced within the multithreaded processor.
According to the invention, there is provided a method including detecting a first event occurrence for a first thread being processed within a multithreaded processor. Responsive to the detection of the first event occurrence, a second thread being processed within the multithreaded processor is monitored to detect a clearing point for the second thread. Responsive to the detection of the clearing point for the second thread, a functional unit within the multithreaded processor is cleared of data for to both the first and second threads.
Other features of the present invention will be apparent from the accompanying drawings and from the detailed description which follows.