The present invention relates to domino logic circuits, and, more particularly, to the use of specially designed skewed static buffers in domino logic circuitry in order to avoid the loss of data in a four overlapping phase clocking scheme, wherein phase 3 is the invert of phase 1, phase 4 is the invert of phase 2, and phase 2 is a delayed version of phase 1 (usually by a quarter of a period).
As is known in the art, domino logic is a precharged, non-inverting family of CMOS logic that can be pipelined using multiple clock phases to achieve high-speed operation. Domino logic is faster than standard static logic, but it is more difficult to design because of its increased complexity, primarily in the clocking network. In addition, domino logic uses more power and more integrated circuit area than equivalent standard static logic.
In domino logic, a “precharge” clock phase is used, followed by an “evaluate” clock phase. During the precharge phase, when the clock is low, the output of the cell is preset to a low logic state (logic zero). During the evaluate phase, when the clock is high, the output of the cell either stays low or transitions to a high value on the condition that, based on the data inputs, the function of the cell evaluates a logic one value. This is in contrast to standard static logic typically used with CMOS technology. In static logic designs, the output of the cell can rise or fall any time an input changes.
As discussed above, domino logic is a pulsed logic. Within a given clock period, domino gates evaluate and then go to precharge. Therefore, it is important to make sure that the result from a gate is consumed by the next gate before going to precharge. Also, if a domino signal is logically ANDed with other domino signals, their pulsed values must overlap long enough to allow the gate to compute the correct value.
As is known in the prior art, in a four phase clocking scheme, in order to effectively propagate timing critical data in a datapath, it is important that, for each domino cell, the clock rises some time before the latest data arrives, otherwise the data has to wait and consequently the output is delayed. It is also important, for each domino cell, that the data arrives some setup time before the clock falls in order to be correctly captured. One way to initially assign the phases, as defined in the prior art, is to choose, for each domino gate, the latest phase rising immediately before the arrival of the latest data.
FIG. 1(a) shows an example of two interconnected datapaths, which have been phase assigned according to the rules mentioned above. The first datapath is from “REG1” to “REG2”, and the second datapath is from “REG3” to “REG4”. In the configuration shown in FIG. 1(a), the AND gate U0 on the first datapath is coupled to a domino gate U1 from the same path, clocked on phase 3 (at the slow “A” input), and to a domino gate U2 from the second datapath, clocked on phase 1 (at the fast “B” input). This situation is known in the prior art as “phase skipping”. The timing diagram of FIG. 1(b) shows that the fast input goes to precharge some time (“tp”) after phase 1 goes low, whereas the slow input goes high some evaluate time (“te”) after phase 3 goes high. In the situation shown in FIG. 1(b), the data on the slow input arrives after the data on the fast input has been lost and therefore can never be captured by the AND gate.
To prevent this situation, the fast input B has to be delayed such that its logic one value overlaps long enough (“ov”) with the logic one value on the slow input A, before going to precharge. The overlap requirement is a characteristic of the cell, under certain conditions (process, voltage, temperature, transition time on the inputs) and has been previously characterized.
In the prior art, one way of achieving this delay, as shown in FIG. 2(a), was to insert a domino buffer D0 before the fast input, and to assign this buffer to the intermediate phase (i.e. phase 2 in this example). The new arrangement shown in FIG. 2(a) modifies the timing conditions, which are shown in FIG. 2(b), since the precharge on the fast input is now relative to the falling edge of phase 2. The drawback of this prior art solution is that it adds to the clock network loading, increases power consumption and integrated circuit area, and makes clock tree synthesis more complex. The number of dynamic buffers added to a design to fix the phase skipping problems can be significant (typically adding 5% to the total number of dynamic cells).
What is desired, therefore, is a circuit and method for providing the necessary delay to satisfactorily address the phase skipping issue in a domino logic circuit, but overcoming the problems of the prior art domino buffer solution that leads to increased complexity, power consumption, and integrated circuit area.