In a synchronous circuit within a microprocessor, a clock signal is used to define a time reference for the movement of data within that circuit. Such a synchronous circuit includes multiple register stages in what is commonly referred to as a pipeline.
Modern microprocessors suffer a huge power drawback, where power is a combination of static and dynamic contributions, where the static contribution is approximately proportional to the silicon area of a macro. One approach to reduce dynamic power is to switch off the clock signal of not required register stages within the synchronous circuit. This procedure is also known as clock gating. By clock gating, the power dissipated in the clock mesh is reduced as well as data switching power is reduced. The latter is due to the fact that the outputs of the registers are constant if there is no clock signal.
To maximize the benefit of clock gating clock activation or deactivation is performed in a small grain—in the extreme case on a single cycle basis.
A simple clock gating scheme is known from Li et Al, “Deterministic Clock Gating for Microprocessor Power Reduction”; Proceedings of the 9th International Symposium on High Performance Computer-Architecture, 2003, pp. 113-122, wherein a register to be clock gated is connected with the output of an AND-gate whose first input is fed with the clock signal and whose second input is fed with a clock activation signal. A problem in applying this approach for a pipeline is that for each stage a clock activate signal plus a clock signal is needed that are synchronously fed to AND-gates dedicated to the stages.
A common solution for this problem is to generate local clock signals by local clock buffers (LCB) within the stages and to provide a clock activate signal for the first stage of a pipeline and to propagate this clock activate signal parallel to the data signal through the pipeline. The clock activation signal when propagated from stage to stage activates the LCB of a stage it reaches which LCB activates the particular stage by clocking a data register in order to forward a data signal stored within that data register to the next stage. To propagate the clock activation signal from stage to stage synchronous with the data signal, control registers arranged in parallel to the data registers of the pipeline are used. Since the LCB of a stage that activates the data register belonging to that stage would continue to activate the clock whenever the corresponding clock activation signal stored in the control register of the previous stage is high, the control registers of the stages have to be clocked at least twice. Thereby the first clocking is to latch a clock activation signal into the control register in order to activate the LCB of the data register and the second clocking is to reset the clock activation signal in that control register in order to stop activation of the LCB. Due to this, the clock activation signal cannot be propagated through the pipeline within the same clock domain as the data signal as long as the wider and therefore higher power data registers are clocked only once. According to the state of the art, a second LCB per stage is required that forms the clock domain for the clock activation signal propagation within the pipeline.
State-of-the-art circuitries as the circuitry 1 shown in FIG. 1 use a clock activate signal act to activate a LCB 2 in order to activate a set of data clock signals clck_d for the corresponding cycle. Within the circuitry 1, the data clock signals clck_d are provided by a first LCB 2 per stage. The LCB 2 derives the clock signal clck_d from a primary clock signal clck_p generated by a main clock not shown. The LCB 2 can be activated or deactivated by the clock activation signal act. This clock activate signal act is propagated synchronous with the data through the pipeline 3 as schematically depicted by act_0, act_1 for the clock activation signal at the first stage (act_0) and second stage (act_1). Whereby the clock activate signals act stored in control registers 5, each control register 5 associated with each stage of the pipeline 3 indicates whether valid stage data is in the stage and clocking has to be performed or not. To switch on and off this clock activate signal act any control register 5 storing a clock activate signal act has to be clocked at least twice. Hence, the clock activate signal act cannot be latched with the same registers 4 as the data-signals, since compared to the registers 4, the control registers 5 have to be clocked with at least twice. Due to this, the clock activation signal act is propagated within the control registers 5 associated with the stages of the pipeline 3 but clocked within another clocking domain than the registers 4. In order to propagate the clock activation signal act from stage to stage, a second LCB 6 is required per stage. The LCB 6 forms the second clock domain for the clock activation signal act. The LCB 6 derives the control clock signal clck_c from the primary clock signal clck_p generated by the main clock not shown. Thereby the switching activity of the control clock signal clck_c is higher than the one of the data clock signal clck_d. Within the example shown, the clock activity of the LCB 6 is twice as high as clock activity of the LCB 2 in the worst-case. An activation of the second LCB 6 takes place if either an incoming activation signal act_0 or an outgoing activation signal act_1 of the particular stage is high, i.e. has a value of ‘1’.
A main drawback of the synchronous circuit according to the state of the art is that in each stage of the pipeline, two LCBs are needed, one for the clock domain of the data registers for the data signal and a second one for the clock domain of the control registers. This is disadvantageous because LCBs are highly complex and large circuitries with high silicon area requirements and high active as well as leakage power consumption.