Current processor architecture design involves “pipelining” in which instructions are broken into smaller steps referred to as “stages” and performed in a manner reminiscent of an assembly line. Generally, each stage is assigned a certain amount of time to be performed. This is also referred to as “delay”. Pipelined processors are popular because once all the stages of the pipeline are filled (i.e. executing their assigned step), it only takes a length of one cycle (the delay of a stage) to complete an instruction. In this manner, the computer's cycle time is the time of the slowest stage in the pipeline.
It is generally understood that the throughput of a processor pipeline is maximized when the total latency of the pipeline is divided evenly between all the stages. Thus, balancing the delay of the microarchitectural pipeline stages such that each microarchitectural pipeline stage has an equal delay has been a primary design objective in order to maximize instruction throughput.
“Instruction throughput” refers to the number of instructions that can be executed in a unit of time. In particular, while a particular instruction may physically require a certain amount of time (or clock cycles) to be performed, once the pipeline is filled, each instruction can appear to be completed (or “retired”) in a single unit of time or “cycle”.
Although balancing delay across pipeline stages increases instruction throughput, a delay-balanced approach can cause significant energy inefficiency in processors because each microarchitectural pipeline stage gets the same amount of time to complete, irrespective of its size or complexity. For power-optimized processors (i.e., processors where circuit and design-level optimizations reclaim timing slack to save power), the inefficiency manifests itself as a significant imbalance in power consumption of different microarchitectural pipeline stages.
Accordingly, with escalating power density, a focus of ongoing research is being directed to energy efficiency.