One measure of performance for a computer processor is known as a power-efficiency ratio, which measures the performance of the processor per watt of energy consumed. As computing devices become smaller and more powerful, demand is increasing for higher performance and lower power consumption in processors.
One factor in achieving better performance in a processor is parallelism, particularly instruction-level-parallelism (ILP). Unlike a specific hardware accelerator such as an application specific integrated circuit (ASIC), a processor is instruction-driven and programmed with corresponding software. A typical computer program is a list of instructions which, when compiled or assembled, generates a sequence of machine instructions or operations that a processor executes. The operations have a program order defined by the logic of the computer program and are generally intended for sequential execution in the program order. Scalar processors execute the operations in the program order, which limits a scalar processor to completing one operation before beginning the next operation.
A superscalar processor architecture implements ILP within a single processor. Due to the parallelism, a superscalar processor allows faster processor throughput than would otherwise be possible at a given clock rate. A superscalar processor executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different function units on the processor. Each function unit is not a separate core, but is instead an execution unit such as an arithmetic logic unit, a bit shifter, or a multiplier, among other options, within a single processor.
One factor affecting power consumption in any processor is the global clock tree. The global clock tree is usually deployed throughout the processor to synchronise and drive function units, such as instruction decoders, schedulers, execution units, register files, buffers, and the like. Larger processors have a correspondingly larger number of function units, and specifically execution units and buffers, which require a larger global clock tree to synchronize these resources. The larger global clock tree results in higher power consumption. It is estimated that a global clock tree consumes about 20%˜30% of the total power of a processor. Another problem with the global clock tree is that when function units are not used at a particular moment their clocks are still toggled, thus consuming power unnecessarily.