The present invention relates to CMOS logic circuits, and more particularly to a high speed, self-reset CMOS for wave propagation logic using a half-cycle clocking scheme.
For data processing environments requiring reduced power consumption, high noise immunity, competitive speeds and costs, or any of these requirements, CMOS has become the technology of choice. Unfortunately, CMOS is not without certain problems, one of which stems for the complementary nature of CMOS circuits. A typical CMOS input gate a p-channel device and an n-channel device, which can significantly raise the number of CMOS devices in any given design.
One solution to keeping the CMOS device count down is to use a form of clocked CMOS logic, conventionally referred to as "domino" logic. Domino logic provides low switching thresholds and reduced device count, leading to fast and area efficient circuit implementations. Thus, for high performance computing architectures, domino logic presents an attractive design style. Domino logic is capable of operating 1.5-2 times faster than static CMOS logic because the dynamic gates present a much lower input capacitance for the same output current, and have a slower switching threshold.
One conventional approach to the use of domino logic is to divide the logic into two pairs of logic groups, with each group being clocked by one or another of two phases of a clock signal. This approach permits one logic group to precharge, readying the logic of that phase circuit for the evaluation period, while the other logic group is in the evaluation period, developing an output signal based upon the evaluated inputs. Thus, rather than having all the gates of the two groups precharge at the same time, this approach tends to mask the precharge time from the critical path. With this approach, the minimum number of delays between latches is zero, and the maximum number of delays is constrained by the cycle time.
Domino logic does, however, have some disadvantages. The actual time for the operational phase of the logic is t=T-2 (tskew+tset.up), which is a waste of approximately 25% of the entire cycle. Also, a half cycle that is longer than expected due to process variations or modeling inaccuracies cannot borrow time from adjacent, less critical half cycles. Further, the latch hold time (thold) can create a problem since a fast clock edge and small clock skew are required. Another shortcoming is that since all domino circuits are clocked to switch them between the precharge and evaluation periods, there is presented thereby a substantial load on the clock circuit. This load, and the requirement for rapid clock switching (i.e. fast clock edges) can lead to a peak power supply current during the simultaneous precharge of all domino gates and, therefore, undesirably large power consumption.
One possible solution to these problems is the opportunistic time-borrowing skew-tolerant technique of domino circuit design, an example of which can be found in U.S. Pat. No. 5,517,136. This technique employs a complimentary domino logic design that achieves a performance increase of at least 25% by eliminating latches in the critical path, and allowing for opportunistic time borrowing across clock boundaries.
According to this technique, a pipeline of domino logic includes a plurality of logic gates controlled by multiple (i.e., four) clock signals. A first clock signal is a standard clock signal having an approximately 50% duty cycle. A second clock signal is the inverse of the first clock signal, while third and fourth clock signals are delayed versions of the first and second clock signals so that they have rising edges substantially synchronous with the rising edges of the first and second clock signals, but the falling edges are respectively delayed. The clock phases are arranged so that the edge of the clock signal that initiates the precharge phase is delayed in a way that allows the evaluation phase to continue into the subsequent half cycle, thereby accomplishing forward time-borrowing. An example of this approach can be found in U.S. Pat. No. 5,517,136
While the time-borrowing approach has its advantages, it is not without its disadvantages. First, there is the requirement of four separate clock signals, and the clock skew between these four clock signals is increased in comparison to that of the single clock signal. Also, the domino pipeline must comprise a first domino gate controlled by the first clock signal, a second domino gate controlled by the second clock signal, and so on. The minimum number of gates is two in a half-clock phase. Further, the requirement of four clock drivers increases power consumption. In addition, the time of the borrow (tbor) is limited in that it must be less than T/2-tp-2tskew-tsetup, where T is the cycle time, tsetup is the setup time of the last gate in a half cycle, tskew is the clock skew, and tp is the minimum time for the precharge of the domino gates. Today's fastest microprocessors are operating at cycle times below 18 fanout-of-four inverter (FO4) delays; the clock skew is 2FO4 delays. Setup time is 1.5 FO4 delays. The minimum time for precharge is 2FO4 delays. Thus, the tbor is only 1.5 FO4 delays. This time is not sufficient for the compensation of all clock skews. So this solution does not give the essential increase of clock frequency needed by high performance processors.
An alternate solution is based upon delayed or cascaded reset dynamic circuits. An example of this approach is shown in "A 1.0-GHz Single-Issue 64-bit Power PC Integer Processor," by J. Silverman, et al, IEEE J. of solid-state circuits, Vol. 33, No. 11, pages 1600-1608, November 1998, and is illustrated in FIGS. 1A and 1B. This solution employs a short chain 10 of CMOS logic gates, each including evaluation stage pulldown paths 12 (12a, 12b, . . . , 12d) that form the inputs for evaluation. During the evaluation period an output will be developed that depends upon the particular logic function implemented by the particular evaluation stage and its associated pulldown path 12. A global clock, NCLK, is used to enable the output of the cycle-bounding latch and launch a computation down the chain 10. If the logic equation represented by the pulldown path 12a is true, the precharged dynamic node will fall, and the output A of that logic stage will rise. This in turn may trigger the next gate in the chain 10, and computation propagates through the logic of the chain 10 to the macro output D. A one-shot circuit 14, triggered by the same clock edge of NCLK that launched the data from the latch produces a low-going pulse at some delay after NCLK falls. The delay is chosen such that if the output A rises in the next cycle, the output will be high long enough to robustly switch the next gate in the chain 10. This low-going pulse resets the first gate in the chain so that the output A falls. The reset pulse is propagated down the logic path through a chain of inverters, timed so that the reset signal is applied to each subsequent gate only after the inputs to the gate have returned to ground. The reset pulse must be long enough to ensure that the dynamic node returns fully to the power rail and short enough so that the reset signal is off before the next data input to the gate arrives.
The advantage of cascading the resets in this way is that most gates in the path can be built without a "foot" or ground interrupt device in the pulldown path of the evaluation state. This allows the device to be used as an extra logic element (e.g., input gate) in the pulldown path 12, or removed to obtain higher speed (at least 10%). Additionally, precharge current is drawn from the supply throughout the cycle rather than only at the clock edges, as is the case with conventional domino dynamic circuits, reducing peak current demand.
However, this solution has its drawbacks. One is that the actual time for the logic (t log) is T-(tskew+tset.up). This means that at least 10-15% of the entire cycle is wasted. Another problem is that all data paths must have equal delay so that the reset signal is off before the next data input to the gate arrives. In addition, a clock cycle, which is longer than expected, due to process variations or modeling inaccuracies, cannot borrow time from adjacent, less critical cycles. And, as the cycle time increases more than two times, operation of this circuit becomes erratic.
Thus, there is a need for a high speed, low power, low device count, domino logic circuit.