1. Field of the Invention
The present invention generally relates to synchronous integrated circuits and more particularly to reducing power consumption in a synchronous pipeline circuit.
2. Background Description
Semiconductor technology and chip manufacturing advances have resulted in a steady increase of on-chip clock frequencies, the number of transistors on a single chip and the die size itself accompanied by a corresponding decrease in chip supply voltage. Generally, the power consumed by a given clocked unit (e.g., latch, register, register file, functional unit and etc.) increases linearly with the frequency of switching within the unit. Thus, not withstanding the decrease of chip supply voltage, chip power consumption has increased as well.
Chip and system level cooling and packaging costs have escalated as a natural result of this increase in chip power. It is crucial for low end systems (e.g., handhelds, portable and mobile systems) to reduce net energy consumption to extend battery life but, without degrading performance to unacceptable levels. In current microprocessor designs, over 70% of the power consumed is attributable to the clock alone. In a typical synchronous design, over 90% of this power is consumed in local clock splitters/drivers and latches.
A typical pipeline is a complex logic function arranged in a series of multiple logic functions or elements in a logic path with pipeline data traversing each element. Since data has a determinable delay in each element, multiple data may be simultaneously sequentially traversing the pipeline. Pipelines may be buffered or unbuffered. In buffered pipelines, pipeline logic is interrupted by registers that form boundaries to segment the logic into short paths, no longer than a single clock cycle long. Unbuffered pipelines, also known as wave pipelines, are several clock cycles long, i.e., the propagation delay through the entire pipeline takes multiple cycles and data items propagate freely through from one end to the other.
A first-in first-out (FIFO) register is a simple example of a sequential/buffered pipeline. A FIFO is an M stage by N bit register file with each of M stages including an N latch register, at least one latch for each data bit. Normally, all of the stages are simultaneously clocked by a single global clock, passing data items from one stage to the next with each clock. An N-bit data item from an input environment enters a first stage on one clock cycle and, substantially the same N-bit word exits the last stage unchanged at an output environment M clock cycles later. Thus, a FIFO may be used as an M-clock cycle delay for example. On each clock cycle (e.g., every rising or falling clock edge) each N-bit word in the FIFO advances one stage. In a typical more complicated pipeline example, logic separates some or all of the stages, e.g., a Multiply/Add-Accumulate (MAAC) unit or other state of the art pipeline microprocessors functional unit.
For a 1 gigahertz (1 GHz) clock, for example, each clock cycle is 1 nanosecond (1 ns) long. Thus in this example, logic in each segment must have a propagation delay shorter than 1 ns. A register stage is (or the latches in the stage are) normally referred to as transparent, when the stage passes data from its input to its output. The same stage is normally referred to as opaque when data is latched in it, i.e., regardless of input the opaque latch is holding its output constant, such that the input does not pass to its output. So for example, in a typical pipeline based on master/slave latches, clocked by an ungated clock, stages are normally opaque and alternate stages are pulsed transparent in alternate clock states, e.g., even stages held opaque and odd stages pulsed transparent when the clock is high and vice versa when the clock is low. While master and slave latches are really separate latch stages of a pipeline, they are typically collectively referred to paired as a stage.
Clock gating techniques, selectively turning the clock on and off, have been used to reduce the number of pipeline clock pulses in synchronous designs such as microprocessors, thereby reducing clock related power consumption. However, the local clock is still pulsed for each stage, at least once for each data item propagating through the pipeline, to minimize the risk of data races from data items passing through the latches of adjacent pipeline stages.
For the same 1 GHz clock example, an unbuffered pipeline is an n nanosecond long path, i.e., n clock cycles long. In an ideal design where the logic is well behaved and the path is free from race conditions, each datum or data item (i.e., all bits) traversing the path (a wave) arrives at the same point at the end of each of the n clock cycles. Wave pipelines allow multiple temporally spaced data (waves) to traverse the entire pipeline simultaneously, uninterrupted by latches, avoiding clock related power consumption. Ideally, n data items may be simultaneously traversing the path, each entering the path at the beginning of a clock cycle and, n cycles later, each exiting at the end of a clock cycle. In practice, however, logic is seldom well behaved and race conditions always exist to some extent because some bits have longer logic paths than others.
Consequently, wave pipelines have required strict control of short and long path delays in data path logic to avoid data races, i.e., to prevent leading edges from one wave from catching trailing edges of another downstream edge. Further, prior art wave pipelines have been precluded from using Dynamic Voltage and Frequency Scaling (DVFS) because short and long path delays scale differently. Also, DVFS may enhance effects of manufacturing variations, skew, jitter, and switching current (dI/dt) noise. Further, without path latches, functional testing path logic is difficult if not impossible. Consequently, the pipeline may not be stopped without inserting additional costly buffers/muxes.
U.S. Pat. No. 7,076,682, “Synchronous Pipeline With Normally Transparent Pipeline Stages” to Hans M. Jacobson, issued Jun. 11, 2006, assigned to the assignee of the present invention and incorporated herein by reference describes another pipeline approach. Jacobson teaches gating pipeline stages normally transparent. Internal stages are gated opaque only when necessary to separate data items and avoid race conditions from closely (temporally) spaced pipeline data, e.g. two successive clock cycles. However, race conditions seldom occur at every internal stage, even for adjacent pipeline data items. So, even with Jacobson, some stage clocking may be eliminated.
Thus, there exists a need for dynamically selected latch stage clocking for synchronous pipelines that allows data items to propagate as data waves in a wave pipeline until each wave reaches a point where beyond, a race condition is likely to exist.