During the past three decades, the power consumption of integrated circuits, including microprocessors, has been increasing at an exponential rate. This steady increase in power dissipation is the result of several factors. First, the number of transistors and the transistor density has doubled about every 24 months. However, the power efficiency of micro-architectures, measured by MIPS/Watt, degrades considerably as more superscalar features are built into a design. Second, the use of more complex circuit techniques has allowed clock frequencies to increase faster than pure process scaling would suggest; often at the expense of increased power. Third, aggressive transistor technologies with higher current carrying capabilities and lower threshold voltages have increased switching speeds at the expense of significant sub-threshold leakage current. Last, improvements in compilers and software applications have also increased the switching activity within a microprocessor.
If this trend continues, it is expected that the power consumption of typical microprocessors may be several thousand Watts by 2008. This presents an enormous challenge in the design of the power distribution networks needed to carry the large currents and also in the verification of digital noise immunity. Furthermore, these predicted power levels are prohibitively large from a reliability and system cost perspective. Also, from a system performance standpoint, high power dissipation limits the scalability in the number of processors that can be incorporated into a system and the number of cores on a single die. From this discussion, it becomes clear that total power consumption will eventually become a limiting factor to increased chip integration.
Despite the power dilemma, designers are still most concerned about speed performance because, in most cases, that is what determines whether a system is successful. For most microprocessors incorporating advanced superscalar micro-architectures, this has resulted in the use of dynamic domino logic.
Dynamic logic is a digital circuit design technique used in some high-performance integrated circuits. In contrast to the more popular logic family known as static CMOS, dynamic logic circuits are faster, because they present much lower input capacitance for the same output current and have a lower switching threshold. Unfortunately, dynamic circuits are more susceptible to noise than static CMOS. They also dissipate more power than their static counterparts because of their higher activity factors and significant clock loading. However, in many circumstances they have proven to be the only circuit family able to meet the demands of reduced cycle times.
Domino logic gates are a popular dynamic logic family, in which an inverting static gate is inserted between successive dynamic gates. Standard domino logic inserts an inverter between the dynamic gates while compound domino logic inserts multiple input complementary gates. The dynamic/static gate pair is known as a domino gate, although it is in fact constructed from two gates. A series of connected domino gates precharge simultaneously as if setting up a set of dominos. During evaluation, the first dynamic gate falls causing the static gate to rise which then causes the next dynamic gate to fall and its static gate to rise, much like a chain of toppling dominos. It is common practice in domino logic design to divide a pipeline 2 of series-connected domino gates 4 into “cells” 6 each of which is controlled by a respective clock phase Φ, as may be seen in FIG. 1. Each cell 6 may contain one or more dynamic logic gates 8. As may also be seen in FIG. 1, cells 6a controlled by clock phase Φ1 may be referred to as “phase 1 logic”; cells 6b controlled by clock phase Φ2 may be referred to as “phase 2 logic”, and cells 6c controlled by clock phase Φ3 may be referred to as “phase 3 logic”.
Domino logic circuits are often used in microprocessor critical paths because of their 1.5 to 2 times speed improvement over static CMOS gates. Despite their wide application to microprocessor design, conventional single-rail domino is not functionally complete because of its inability to perform inversions. There are many situations where inverting or non-monotonic logic needs to be used in conjunction with non-inverting/monotonic logic. These include multiplexers, parity circuits, and arithmetic units which depend heavily on XOR and XNOR functions. However, if inverting functions (some inputs to the first dynamic gate of a logic cell 6 are complemented) or non-monotonic functions are used inside a domino pipeline 2 with multi-phase clocks, the inverting or non-monotonic functions will be corrupted when the previous cell precharges. For example, in FIG. 1, the inverting gate 8 in the phase 2 logic 6b will be corrupted when the phase 1 logic 6a precharges; and likewise the inverting gate 8 in the phase 3 logic 6c will be corrupted when the phase 2 logic 6b precharges. This is because an inverting function of the previous logic cell 6 might, for example, cause a 0→1 transition on the input of the current cell in the middle of the evaluate cycle, where the input to the current cell 6 should have remained at 0 (as it was at the start of the evaluate cycle). This is illustrated in FIG. 2 for the case of two AND gates 10a, 10b in adjacent phase logic cells 6, where, one of the inputs to the second AND gate 10b is complemented. In the case of a non-monotonic function, the inputs to the dynamic gate 10b will change before the end of the current evaluate cycle and the output might no longer maintain the correct result. Such a logic function, where an inversion exists at the input of a dynamic gate or the gate implements non-monotonic logic, will hereafter be referred to as an input complemented or non-monotonic dynamic logic function.
Consequently, circuit designers must use slower logic circuits such as static CMOS or transmission gates to implement inverting and non-monotonic functions with the additional cost of increased overhead to interface from dynamic to static logic and back. Clock-blocking techniques that require the clock to be the last input signal to arrive at a dynamic gate after the data inputs, so that non-inverting and monotonic functions are possible, have also been used. As an example of such clock-blocking techniques is known as clock-delayed (CD)-domino. However, these clock-blocking techniques require precise matching of data and clock delays, which have to be accounted for under all possible process and environmental conditions. Furthermore, clock skew must be budgeted at each clock-blocking gate, making this logic family skew-intolerant. Last, scaling of such designs would normally require complete re-verification of the data and clock delay paths.
For designs where speed is the most critical design parameter, Domino Dynamic Cascade Voltage Switch Logic (DCVSL)/dual-rail domino circuits can be used to meet the requirements for inverting and non-monotonic functions. Such circuits require approximately double the number of transistors compared to single-rail domino logic, resulting in greatly increased routing complexity, circuit area and in many cases, decreased circuit speed due to longer differential routing lines.
Since domino circuits are synchronized by clocks, clock skew can have a significant impact on domino circuit performance. Skew tolerant domino circuits have been shown to alleviate the effects of skew on the performance of traditional domino circuits.
Skew-tolerant domino circuits remove the three sources of sequencing overhead found in traditional latch-based domino pipelines: clock skew, latch overhead and pipeline imbalances. This is accomplished by supplying overlapping clock phases to different stages of domino logic. The use of overlapping clock phases eliminates the need to budget clock skew in the cycle time, since data can now arrive and depart from different pipeline stages irrespective of modest variations in the arrival time of the clock signals. Furthermore, since the overlapping clock phases allow time for the first domino gate 4 of a logic cell 6 to evaluate before the last gate 4 of the previous cell 6 precharges, latches are eliminated from the pipeline 2 as domino gates 4 inherently function as latches. Finally, if the overlap between clock phases is larger than the worst-case clock skew, then domino gates can “time borrow” across stages. Gates 4 in two adjacent cells 6 can evaluate when their respective clocks are high and overlap, allowing gates that nominally evaluate during a first clock phase to run late into a second clock phase. Thus, removing all the sources of overhead allows the entire cycle time to be available for useful computation.
While many of the design difficulties concerning noise and delay performance of dynamic logic have been addressed in the prior art, practical power considerations have often been ignored. In practice, dynamic logic dissipates more power than static logic, mainly due to its increased switching activity resulting from periodic precharge and discharge operations. Additionally, the use of keeper devices to solve problems due to charge leakage, also tends to increase the transistor count and thus the switched capacitance, with an attendant increase in power consumption. Furthermore, dual-rail domino circuits dissipate more power (approximately double) than single-rail domino because of their increased routing capacitance and unity activity factor.
Accordingly, techniques for reducing the power consumption of domino logic circuits remain highly desirable.