The present invention relates to clock distribution circuitry and more particularly to methods for improving the efficiency of clock gating within low power clock trees.
In typical microprocessor designs, the clock distribution network or xe2x80x9ctreexe2x80x9d can consume from 20% to 50% of a microprocessor""s total active power. As the clock net is usually the single largest power consuming signal within most microprocessor systems, one important technique for reducing power consumption in microprocessor designs is to reduce the power of a microprocessor""s clock distribution tree by breaking up the clock into several separate clocks that can be individually controlled or xe2x80x9cgated offxe2x80x9d when some portions of the microprocessor do not need to be clocked.
This process, known as xe2x80x9cclock gatingxe2x80x9d, disables the clocks fed to logic blocks of the microprocessor when the logic blocks are not currently in use by the microprocessor. Power consumption due to the clocking of logic blocks that are not directly involved with the current operation of the microprocessor thereby is minimized. The clock gating strategy of defining logic blocks that can be clock gated and creating the clock gating control signals that perform the clock gating is typically a manual process that provides little information about the power reduction efficiency of the clock gating.
A problem with clock gating is that it requires additional logic (e.g., clock gating logic) within a microprocessor""s instruction decode and control unit to manage the clock gating control signals. In order to have a net power savings, the clock gating logic must consume less power than is saved by gating the clocks off.
The ideal clock distribution tree has the smallest number of clock gates that yield the maximum amount of clock gating power savings when running typical application code. However, analyzing the efficiency of a clock gating strategy on a microprocessor design and modifying the clock gating strategy to reduce clock distribution tree power consumption remains a challenge. Further, typical clock gating strategies ignore the physical design and location of logic blocks that are gated. In certain clock distribution arrangements, ignoring the physical design and layout of gated logic blocks can generate a wiring overhead that consumes more power than is gained by an optimized clock gating strategy. Accordingly, a need exists for methods for improving the efficiency of clock gating within low power clock trees.
To overcome the needs of the prior art, methods are provided for improving the efficiency of clock gating within low power clock trees. In a first aspect of the invention, a correlation level between a plurality of clock gating signals and their corresponding gates which gate a source clock is determined. The plurality of clock gating signals and their corresponding gates are combined into a single clock gating signal and a single corresponding gate if a preselected level of correlation exists between the plurality of clock gating signals. Preferably a level of usefulness of the plurality of clock gating signals and their corresponding gates also is determined, and the clock source is xe2x80x9cungatedxe2x80x9d by removing at least one of the corresponding gates if a preselected low level of usefulness exists.
In a second aspect of the invention, an area overlap is determined for a plurality of sinks, each driven by one of at least two gated drivers (which, in turn, are driven by at least a portion of a plurality of clock xe2x80x9cdrivenxe2x80x9d gating signals and their corresponding gates), and one of the gated drivers is removed. The sinks of the removed gated driver then are connected to the remaining gated driver driven by a single clock gating signal and a single corresponding gate.
In a third aspect of the invention, the location of sinks and sink clusters within the clock network are identified and physically proximate sink clusters are examined for xe2x80x9ccommonxe2x80x9d sinks (e.g., sinks belonging to the same clock gating group or domain). The physically proximate sink clusters then are rewired to generate a pure clock gating group within each sink cluster if re-wiring the clusters increases wiring length by less than a predetermined amount.
In a fourth aspect of the invention, a clock gating group of the clock network is selected and the power dissipation is computed for all sinks within the selected clock gating group assuming all the sinks therein are wired without clock gating. The power dissipation also is computed for all sinks within the selected clock gating group assuming all the sinks therein are gated. If the power dissipation for all sinks within the selected clock gating group is reduced by individually wiring the sinks within the clock gating group, the clock gating group is ungated (e.g., is partitioned into subgroups). Preferably a similar power dissipation analysis/ungating procedure is performed for all clock gating groups within the clock network. The first, second, third and fourth aspects of the invention may be combined or performed separately and/or individually.
A computer program product also is provided for use in designing a clock network. The inventive program product is carried by a medium readable by a computer (e.g., a carrier wave signal, a floppy disc, a hard drive, a random access memory, etc.). The computer readable medium comprises means for performing the first, second, third and/or fourth aspects of the invention.
Other objects, features and advantages of the present invention will become more fully apparent from the following detailed description of the preferred embodiments, the appended claims and the accompanying drawings.