1. Field of the Invention
The present invention is related to integrated circuit (IC) clock systems and more particularly to optimizing power consumption in synchronous ICs.
2. Background Description
Semiconductor technology and chip manufacturing advances have resulted in a steady increase of on-chip clock frequencies, the number of transistors on a single chip and the die size itself. These increases have been accompanied by a corresponding decrease in chip supply voltage. Generally, the power consumed by a given clocked unit (e.g., latch, register, register file, functional unit and etc.) or clock driver increases linearly with the frequency of switching within the unit. Thus, not withstanding the decrease of chip supply voltage, chip power consumption has increased as well. Chip and system level cooling and packaging costs have escalated as a natural result of this increase in chip power. It is especially crucial for low end systems (e.g., handhelds, portable and mobile systems) to reduce net energy consumption to extend battery life. However, it is equally crucial that this be done without degrading performance to unacceptable levels.
A basic high performance synchronous IC chip design, e.g., a state of the art microprocessor, includes multiple register stages interspersed throughout chip logic in what is commonly referred to as a pipeline. Typically each register stage or pipeline stage includes a number of latches that are clocked together with the stage latches operating in parallel. Frequently, these pipeline latches are master and slave latches that are referred to as master/slave latches, but that are really separately clocked in latch stages within the pipeline stage. Typically a pair of local clocks, a capture and a launch clock, derived from a global clock, separately gate or clock the master and slave latches, respectively.
A first-in first-out (FIFO) register is a simple example of a pipeline. A FIFO is an M stage by N bit register file with each of M stages including an N latch register, at least one latch for each data bit. Normally, all of the stages are simultaneously clocked by a single global clock, passing data items from one stage to the next with each clock cycle or clock edge. On each clock cycle (e.g., every other rising or falling clock edge) each N-bit word in the FIFO advances one stage. An N-bit data item from an input environment (e.g., random logic connected together in some higher order logic function) enters a first stage on one clock cycle and, substantially the same N-bit word exits the last stage unchanged at an output environment (e.g., a local memory macro or some other higher order logic function) M clock cycles later. In a more complicated pipeline example, logic may separate some or all of the stages, e.g., in a state of the art pipeline microprocessors functional unit. For example, a Multiply/Add-Accumulate (MAAC) unit, where partial results (e.g., from a previous add) are rotated back from the accumulator to be added again.
In current microprocessor designs, over 70% of the power consumed is attributable to the clock alone. In a typical synchronous design, over 90% of this power is consumed in local clock splitters/drivers or buffers (LCBs) and latches. Consequently, reducing LCB power, a primary contributor to chip power consumption, significantly reduces total chip power.
A prior approach to reducing has been to de-tune the LCBs to reduce drive current at the expense of slower local clock edge rates. However, detuning also causes shallower, less well defined clock edges that result in greater timing uncertainty due to the slower edges. If the de-tuning is very aggressive, the slower clock edges ripple through subsequently clocked circuits and offsets some of the power reduction because the clock edges are in between up and down levels for larger periods. With the clock between levels, subsequently clocked gates experience more “flush current,” e.g., both devices on in a CMOS inverter. So, where these shallower clock edges are unacceptable (primarily from less current driving a capacitor load and secondarily from driving the LCBs with shallower edges which also contributes to greater timing uncertainty than faster edges), reduced drive current is not a viable solution.
Thus, there exists a need to reduce power consumption in chip registers and LCBs and especially in synchronous chip registers and LCBs.