Most modern processors embody several pipelined functional units. Typical such units include integer units capable of performing integer arithmetic between register operands, and floating point units capable of performing floating point arithmetic between register operands. There may be dedicated functional units for performing address arithmetic, or, in some machines, integer units may perform these operations. Other functional units may include fetch and store units that operate to retrieve operands from, or store results into, memory. These functional units are referred to as resources.
Many modern processors are capable of commanding operations in more than one functional unit simultaneously. The process of commanding operations in functional units is instruction decode and dispatch.
Superscalar machines have sufficient resources, and sufficiently complex control, that it is possible to dispatch operations from more than one instruction simultaneously. It is known, however, that such machines can only keep all their functional units busy for only a small percentage of time. Most of the time only a subset of functional units are actually performing useful work, in effect the load factor on these functional units is typically low.
Much modern software is written to take advantage of multiple processor machines. This software typically is written to use multiple threads. Software is also frequently able to prioritize those threads, determining which thread should receive the most resources at a particular time.
Multithreaded processors are those that have more than one instruction counter, typically have more than one register set, and are capable of executing more than one instruction stream. For example, machines are known wherein a single pipelined execution unit is timeshared among several instruction streams. Since the execution unit is timeshared, each instruction stream tends to execute somewhat slowly. These machines appear to software as multiple, independent, typically slow, processors.
Machines of superscalar performance having multiple processors on single integrated circuits are known. Machines of this type include the IBM Power-4 and the PA 8800. Typically, each processor on these integrated circuits has its own set of execution unit pipelines. Their die area, and therefore cost, for execution units is therefore typically much greater than with a timeshared multithreaded machine.
It is also known that that the power consumed by large logic circuits, such as processors, is a function of the number of gates switching in each clock cycle, the capacitance on each gate, and the power supply voltage. There are many advantages to reducing the power consumed by a processor, ranging from increased battery life in portable or mobile applications to lessening air conditioning load of computer rooms containing multiple large machines.
While single-integrated-circuit multiprocessor machines offer good performance, they make inefficient use of their resources and consume considerable power.