Field of the Invention
The present invention generally relates to computer systems, and more particularly to a method of managing thread contexts in a multithreading processor.
Description of the Related Art
Today's high performance computer systems use multiple processors to carry out various computer programs such as software applications and operating systems. In a symmetric multi-processor (SMP) computer, all of the processing units are generally identical, that is, they all use a common set or subset of instructions and protocols to operate, and generally have the same architecture. Each processing unit may further include multiple processor cores which actually execute the program instructions to operate the computer. The processor cores may function according to reduced instruction set computing (RISC) techniques, and may employ both pipelining and out-of-order execution of instructions to further improve the performance of the superscalar architecture.
In a superscalar architecture, instructions may be completed in-order and out-of-order. In-order completion means no instruction can complete before all instructions dispatched ahead of it have been completed. Out-of-order completion means that an instruction is allowed to complete before all instructions ahead of it have been completed, as long as predefined rules are satisfied. Within a pipeline superscalar processor, instructions are first fetched, decoded and then buffered. Instructions can be dispatched to execution units as resources and operands become available. Additionally, instructions can be fetched and dispatched speculatively based on predictions about branches taken. The result is a pool of instructions in varying stages of execution, none of which have completed by writing final results to the system memory hierarchy. As resources become available and branches are resolved, the instructions are retired in program order, thus preserving the appearance of a machine that executes the instructions in program order. Overall instruction throughput can be further improved by modifying the hardware within the processor, for example, by having multiple execution units within a single processor core.
Modern computer systems also use a computing technique known as hardware multithreading to independently execute smaller sequences of instructions called threads or contexts. When a processor, for any of a number of reasons, stalls and cannot continue processing or executing one of these threads, the processor can switch to another thread. The term “multithreading” as used by those skilled in the art of computer processor architecture is not the same as the software use of the term in which a process is subdivided into multiple related threads. Software multithreading requires substantial involvement by the operating system which manipulates and saves data from registers to main memory and maintains the program order of related and dependent instructions before a thread switch can occur. Software multithreading does not require nor is it concerned with hardware multithreading and vice versa. Hardware multithreading manipulates hardware-architected registers, execution units and pipelined processors to maintain the state of one or more independently executing sets of instructions (threads) in the processor hardware. Hardware threads could be derived from, for example, different tasks in a multitasking system, different threads compiled from a software multithreading system, or from different input/output processors. In each of these examples of hardware multithreading, more than one thread can be independently maintained in a processor's registers. FIG. 1 illustrates a simplified example of multithreading. Three task contexts 2 each have associated thread contexts 4 which are intermittently swapped out for execution among four processors (processor cores) 6. Multiple threads from the same task need not be carried out on a single processor but rather can be distributed among all of the available processors. When the set of instructions comprising a thread have been completed, the thread and its context are retired from the processor.
Simultaneous multithreading (SMT) is a processor design feature that combines hardware multithreading with superscalar processor technology to allow multiple threads to issue instructions each cycle. Unlike other hardware multithreaded architectures in which only a single hardware context (i.e., thread) is active on any given cycle, SMT permits all thread contexts to simultaneously compete for and share processor resources. Unlike conventional superscalar processors, which suffer from a lack of per-thread instruction-level parallelism (ILP), simultaneous multithreading uses multiple threads to compensate for low single-thread ILP. The performance consequence is significantly higher instruction throughput and program speedups on a variety of workloads that include commercial databases, web servers and scientific applications in both multi-programmed and parallel environments.
The POWER7 processing unit designed by International Business Machines Corporation has eight cores and can select between three threading modes using a single thread, two threads, or four threads per core, for a maximum of 32 possible threads being simultaneously executed in the processing unit. In the POWER processor architecture, the software-visible machine state (machine registers/context) is divided among fixed point or general purpose registers (GPRs), floating point registers (FPRs), vector registers (VRs), and vector-scalar registers (VSRs). The processor hardware includes bits in a machine status register that enable or disable access to the FPRs, VRs and/or VSRs for context switching. This feature enables an operating system to manage access to those facilities and implement schemes such as deferred (“lazy”) state management. That is, when dispatching a thread the operating system will only restore its fixed point state (GPRs), and access to the other facilities (FPRs, VRs, VSRs) will be disabled. If the thread does thereafter attempt to use one of those facilities, an interrupt will result and the operating system can then restore the needed state and enable access to the requested facility.