1. Field of the Invention
The present invention generally relates to synchronous integrated circuits and more particularly to reducing power consumption in a synchronous pipeline circuit.
2. Background Description
Semiconductor technology and chip manufacturing advances have resulted in a steady increase of on-chip clock frequencies, the number of transistors on a single chip and the die size itself accompanied by a corresponding decrease in chip supply voltage. Generally, the power consumed by a given clocked unit (e.g., latch, register, register file, functional unit and etc.) increases linearly with the frequency of switching within the unit. Thus, not withstanding the decrease of chip supply voltage, chip power consumption has increased as well.
Chip and system level cooling and packaging costs have escalated as a natural result of this increase in chip power. It is crucial for low end systems (e.g., handhelds, portable and mobile systems) to reduce net energy consumption to extend battery life but, without degrading performance to unacceptable levels. In current microprocessor designs, over 70% of the power consumed is attributable to the clock alone. In a typical synchronous design, over 90% of this power is consumed in local clock splitters/drivers and latches.
Basically, a synchronous design includes multiple register stages in what is commonly referred to as a pipeline. A register stage or latch is normally referred to as transparent, when the it instantaneously passes data value at its input to its output; the same stage or latch is normally referred to as opaque when data is latched in it, i.e., the opaque latch is holding its output constant, regardless of its input such that its input is not passed to its output. Thus, in a typical pipeline based on master/slave latches, clocked by an ungated clock, stages are normally opaque and alternate stages are pulsed transparent in alternate clock states, e.g., even stages held opaque and odd stages pulsed transparent when the clock is high and vice versa when the clock is low. Clock gating, selectively turning the clock on and off, has been used to reduce power dissipation in synchronous designs such as microprocessors. While master and slave latches are really separate latch stages of a pipeline, they are typically collectively referred to paired as a stage.
A simple example of a pipeline is a first-in first-out (FIFO) register. In a more complicated pipeline example, logic may separate some or all of the stages, e.g., a Multiply/Add-Accumulate (MAAC) unit or other state of the art pipeline microprocessors functional unit. A FIFO is an M stage by N bit register file with each of M stages including an N latch register, at least one latch for each data bit. Normally, all of the stages are simultaneously clocked by a single global clock, passing data items from one stage to the next with each clock. An N-bit data item from an input environment enters a first stage on one clock cycle and, substantially the same N-bit word exits the last stage unchanged at an output environment M clock cycles later. Thus, a FIFO may be used as an M-clock cycle delay. On each clock cycle (e.g., every other rising or falling clock edge) each N-bit word in the FIFO advances one stage. Without clock gating every FIFO stage is clocked at every cycle. With coarse clock gating, the clock may be gated off when the FIFO is empty to reduce/eliminate FIFO power consumption during that time. With finer grained clock gating, individual FIFO stages may be gated off when valid data is not in the particular stage, e.g., to save power even when the FIFO is not empty.
Fine grained clock gating techniques selectively stop functional unit clocks by selectively gating local clocks off within functional blocks, e.g., to stages within the pipeline. See, e.g., U.S. application Ser. No. 10/262,769 entitled “INTERLOCKED SYNCHRONOUS PIPELINE CLOCK GATING” to Hans M. Jacobson et al., filed Oct. 2, 2002, and assigned to the assignee of the present invention and incorporated herein by reference. While these clock gating techniques can reduce the number of clock pulses generated in the pipeline, the local clock is still pulsed for each stage, at least once for each data item propagating through the pipeline, to minimize the risk of data races through the latches of adjacent pipeline stages.
Thus, there exists a need for dynamically selected latch stage clocking for synchronous pipelines that adapts to the current state of the pipeline, on a cycle-by-cycle basis, without reducing the operation frequency of the pipeline.