FIG. 1 is an illustration of a conventional clock gating cell (CGC) 100. CGCs are used in many applications to stop the propagation of a clock signal to an unused circuit to reduce the dynamic power by halting computation in the circuit. For instance, in a handheld device that includes MP3 functionality and phone functionality, when a user is playing an MP3 file but not using the phone, one or more CGCs can be used to prevent the clock from propagating to parts of the processor (as well as to other chips) that are not used when the phone functionality is idle. Parts that do not receive the clock use much less power, so that battery life is extended. Furthermore, the un-gated clock signal itself has a high activity factor, making it a major source of dynamic power usage.
The CGC 100 has a clock input and enable inputs. The global clock source is the clock input, and it is labeled Clk_in. A CGC, such as CGC 100, can include any number of enable inputs, examples of which can include a clock enable (Clk_en), and a scan enable (test_en) that activates the clock during scan testing of the circuit. FIG. 1 shows the generic block diagram of a typical CGC standard cell circuit that includes an active low latch 101, a two-input AND gate 102, and enable logic 103. The output of the CGC 100 is Clk, which is the gated clock pulse.
At lower voltages, the edge rate of Clk_in can become significantly degraded and eventually lead to functional failure in the CGC 100. FIG. 2 is an illustration of a more detailed view of a CGC 150 configured according to design of the CGC 100, and FIG. 3 illustrates a timing diagram of key nodes of the CGC 150 during some operations. FIG. 3 shows that the active low latch 101 includes, among other things, an inverter chain (i.e., the inverters 107 and 108) and a pull-down stack (i.e., the NMOS transistors 104 and 106). When the active low latch 101 is enabled (either from test_en or Clk_en signal) initially the pn1 node is set to logic 1 during the transparent phase of the latch 101. Under this condition the CGC 150 passes the input Clk_in signal to the output Clk. Initially the Clk_in is at logic 0, so the pn2 node is at logic 1. For a slow rising input Clk_in signal, the voltage at the internally buffered Clk_net node can rise quickly, even before Clk_in rises halfway to Vdd/2 (where Vdd is system power), thereby turning on the pull-down NFET 104 (FIG. 2) of the pn1 node. This is undesirable. It creates a race condition and provides a discharging path for the pn1 node until the input Clk_in signal is propagated to the output and shuts off the feedback pull-down NFET 105. As shown in FIG. 3, the pn1 node voltage can drop momentarily before being restored to logic 1. The drop in voltage at the pn1 node leads to functional failures at low voltage operation of the chip.
CGCs are not limited to using active low latches. For instance, FIG. 4 is an illustration of a conventional CGC 400 which employs an active high latch and an OR gate at the output. CGC 400 is, essentially, a dual of CGC 100. Potential functional failures can occur during slower Clk_in transitions in the active high-latch based CGC 200 of FIG. 400 when premature charging of the pn1 node occurs (as opposed to the premature discharging issue of CGC 100 of FIG. 1).
Prior art solutions to guard against the functional failure described above include over-designing the clock tree to maintain a good edge rate during low voltage operations or slower lots of the manufactured parts. However, over-designing the clock tree comes at a cost of burning more dynamic power and shorter battery life. Another solution is to upsize a CGC's output logic to quickly propagate the input clock signal to the output node. This approach is conventionally followed in the industry for general purpose clock gating, but it comes at a cost of increasing the area needed for the output logic, burning more dynamic power resulting from the increased area. Moreover, such upsizing of the output logic also increases setup time of enable logic, which is typically an important constraint for any high performance system, e.g., processors and DSP cores.