Conventional systems implement a free-running clock supplied to a number of storage elements. When an enable signal is asserted, new data is captured at the edge of the clock. If a free-running clock is used as a clock of a storage element, the storage element consumes power at every clock edge, even if the other inputs are inactive or unchanged.
Other conventional systems use local clock gating units to lower the power consumption. In such a system, when the enable signal is asserted, a clock gating unit generates a single pulse to capture the new data.
Referring to FIG. 1, a conventional clock network containing a clock tree 10 is shown. The clock network comprises the clock tree 10, a clock gating unit 12, a flip-flop 14, and a number of flip-flops 16a–16n. A number of clock gating units 12 can be implemented on various branches of the clock tree 10. The clock gating unit 12 is shown having a latch 20, an OR gate 22 and an AND gate 24. Some branches may drive single flip-flops. Other branches may go to clock gating units 12, each of which drives a bank of flip-flops (i.e., the flip-flops 16a–16n). The clock input of the flip flops 16a–16n is shown disconnected from a free-running clock signal CLK. An output of the clock gating unit 12 supplies a clock edge when the enable signal EN1 is asserted. Since the clock gating unit 12 contains a few gates and consumes area, it is typically used only for a bank of registers (or flip-flops) which have a common enable signal.
Referring to FIG. 2, a timing diagram of the clock gating unit 12 is shown. When the enable signal EN1 is asserted, the clock gating unit 12 is latched. The output signal LATCH_Q of the latch 20 is gated with the clock, and a single pulse is generated. The single pulse is presented to the clock input of the flip-flops 16a–16n in order to capture the data on the inputs.
Using the clock gating unit 12 for localized clock gating to a number of banks of registers lowers the overall power consumption. However, such an implementation has a number of disadvantages. The clock tree 10 contains one or more levels of buffers to divide the load and reduce the clock skew. The nets (wires) from the clock root to the clock gating units toggles at the frequency of the free-running clock signal CLK and therefore consumes power.
Implementing one or more clock gating units 12 also consumes more area than an implementation without the gating units. During scan test mode, a few nets in the unit cannot be tested to detect manufacturing defects. As shown in FIG. 1, the signal BYP is asserted during scan mode to bypass clock gating. The net that is driven by the latch signal LATCH_Q cannot be tested, because manufacturing defects on that net cannot be observed. The inability to test those nets lowers the test coverage. Additional production tests may be added to cover those nets, but this increases the test time on the tester and increases the manufacturing cost.
Since additional logic is needed to gate the clock signal CLK, such a method is used for banks of registers where the registers in each bank share the same enable signal. An area/power tradeoff decision must be made. If a fewer number of flip-flops than a defined threshold have the same enable, such flip-flops are clocked by the free running clock signal CLK, and consume power even if the data is not changed. The clock skew balancing for clock nets that contain clock gating units is more complicated, and generally needs more levels of buffering.