VLSI technology allows powerful hardware for sophisticated computer applications and multimedia capabilities, such as realtime speech recognition and full-motion video. The changes in computing environment has created a variety of high speed electronics applications. However, there is an increased user desire for portability of computational equipment.
The requirement of portability places severe restrictions on size, weight, and power. Of these, power consumption is a dominant consideration in mobile applications since current battery technology can not provide sufficient energy to run such systems for an acceptably long time. Hence, the traditional mainstay of portable digital applications has been in low-power, low-throughput uses, such as wristwatches and pocket calculators.
A number of portable applications, however, requires low-power and high-throughput, simultaneously. For example, notebook and laptop computers require almost the same computation speed and capabilities as of desktop machines. Equally demanding are developments in personal communications services(PCS's), such as the digital cellular telephony networks which employ complex speech compression algorithms and sophisticated radio modems.
Further, more power is required for the portable multimedia systems supporting full-motion digital video. Power for video compression and decompression and speech recognition is required on top of the already lean power budget. These portable systems have increased capabilities than fixed workstations, and are required to operate in a low power portable environment.
Even in non-portable systems, low power consumption is becoming critical. Until recently, power consumption has not been a great concern since the heat generated on-chip can be sufficiently dissipated using a proper package. However, the reduction in the minimum feature size allows implementation of more functional units in a single chip by increasing the number of integrated transistors.
These functional units are usually computation-intensive and operating concurrently, and power consumption increases dramatically in complex VLSI systems, such as high performance microprocessors and general-purpose digital signal processors (DSP's). Since the power dissipated in a CMOS digital circuit is proportional to the clock frequency, higher operational speed further increases power consumption.
Further, some adequate cooling techniques, such as using fins and fans, are required to handle increased internal heat. Such techniques increase cost and/or limit the amount of functionalities which can be integrated in a single chip. Hence, reducing power consumption has become a critical concern for designing complex VLSI systems.
There are a variety of considerations that must be taken into account for low power design, which include the style of logic used, the technology incorporated, and the architecture employed. Among these, choosing a proper logic style is one of the most important factors for low power, since the power consumed in the arithmetic and logical units is greatly dependent on the way in which these blocks are implemented. The logic circuit choice also affects the architectural selection. Hence, full exploitation of existing logic circuits for optimization and efforts to create a new logic circuit for low power operation are inevitably required.
There are a number of options available in choosing the basic circuit approach and topology of implementing various logic and arithmetic functions. In general, logic families can be divided into two broad category, depending on the type of operation. The first category is a static logic circuit including standard CMOS logic and pass-transistor logic, in which all the internal nodes are static, and thus, noise margin is high. The second category is a dynamic logic circuit which uses precharge technique to improve speed performance. However, the cost increases due to higher design complexity in order to eliminate the problems, such as charge sharing due to dynamic operation.
The simplest form of static logic is the standard CMOS logic having both pMOS and nMOS transistors in a dual form. For example, FIG. 1A shows the structure of a 2-input NAND gate. The standard CMOS logic circuit is disadvantageous since a large number of transistors is required to implement a given Boolean logic function. Further, the width of the pMOS transistor used for the pull-up function must be two or three times larger than the nMOS transistor to make the rise and fall times similar to each other since the pMOS has relatively low current driving capability. Such compensation increases the area to implement the standard CMOS logic, compared to the conventional nMOS logic to achieve the same Boolean logic function. Moreover, the operational speed may be too slow due to an increase in parasitic capacitance.
A Differential Cascode Voltage Switch (DCVS) logic circuit, as shown in FIG. 1B, solves the problem of the standard CMOS circuit. However, the DCVS logic circuit is actually slower and dissipates more power than the standard CMOS logic circuit. During the switching action, the p-channel pull-up transistors must fight against the pull-down logic tree of the nMOS cascode logic network. The signal fighting at the output prolongs the time period for logic evaluation and causes a substantial short-circuit current, increasing the power dissipation.
Another known CMOS logic circuit is a pass-transistor logic. A simple example of this logic circuit is a 2-input multiplexer, i.e., an XOR gate, as illustrated in FIG. 1C. However, the pass logic circuit is disadvantageous due to low current driving capability, resulting in speed degradation. Thus, drivers must be inserted periodically between the stages. Further, the n-channel device cannot drive the logic `high` effectively such that the voltage swing is sacrificed.
The Complementary Pass-Transistor Logic (CPL) circuit solves the problem of the nMOS version of the pass-transistor logic. The CPL uses an nMOS pass-transistor network with low threshold voltage to reduce the voltage drop on logic high level at the output. The CPL consists of a complementary nMOS pass-transistor logic network and two CMOS output inverters, as shown in FIG. 1D. The pass-transistors function as pull-up and pull-down devices. The output inverters shift the logic threshold voltage and are used as buffers to drive the capacitive load.
The dynamic logic circuits have some common basic features. All dynamic logic circuits involve precharging the output nodes to a particular level (usually up to supply voltage), while the current path to ground is turned off. At the completion of precharge, the path to the high level is cut off and the path to the ground is turned on. Depending on the state of the inputs, the output will either be floating at the precharged level or be pulled down to ground. Since the load capacitance is reduced by a factor of two or three, the gate responds roughly twice as fast as the static logic circuit.
FIGS. 2A-2D illustrate different types of dynamic logic circuits. A CMOS domino circuit shares the basic characteristics of the dynamic logic circuit. A single domino logic circuit is shown in FIG. 2A. Another type of dynamic logic circuit is a clocked version of the DCVS circuit, which is similar to static DCVS except a clock signal drives pull-up pMOS transistors instead of cross-coupled connection between these transistors, as shown in FIG. 2B. FIG. 2C illustrates a sample-set differential logic (SSDL), which is a modification of the clocked DCVS. A Latched CMOS differential logic (LCDL) circuit of FIG. 2D uses similar type of sense amplifier to improve speed performance.
Although the above logic circuits attempt to reduce the amount of charge consumed in each cycle, power consumption is large, since the charge is repeatly moved from the supply voltage to the ground voltage within a given cycle. Younis and Knight at MIT proposed a method of charge recovering via a new logic family, called Charge Recovering Logic (CRL), which was described in the articles entitled "Practical implementation of charge recycling Asymptotcally zero power CMOS," Research on integrated systems; Proc. 1993 Symp., Cambridge, Mass. 1993.
The charge recovery technique can achieve energy saving of over 99% when switched sufficiently slowly. The concept is to create a mirror image of a circuit that computes the inverse of the original, as shown in FIG. 3A. As each stage in the circuit finds an answer, it passes the result on to its mirror image which computes the inverse. In the main circuit charge moves toward the end, while charge is recycled back to the beginning in the mirror circuit. However, the logic design for implementing the CRL is quite impractical, and the anticipated power saving is nearly impossible to be realized in ordinary applications.
Succeeding refinements for saving and reusing only a fraction of the charge seem to be compatible with conventional CMOS technology. An example is a Reduced-Power Buffer (RPB), illustrated in FIG. 3B, which uses storage capacitor to save some of the charges otherwise being dissipated. This circuit includes a driver with an additional storage capacitor attached to the output node through a switch T1. During a high-to-low transition, the circuit saves some of the charge into the storage capacitor Cs, instead of dissipation to the ground. Just before the next low-to-high transition, the saved charge is recycled to the output node.
This scheme is only useful to the applications dominated by switching of large capacitive loads, and the storage capacitor must be relatively larger than the load capacitor to obtain sufficient power saving. Another example is a refresh scheme in DRAM to recycle the charge used to refresh cells in one array for use in the other array, which is described in an article entitled "A charge Recycle Refresh for Gb-Scale DRAM's in File Applications," IEEE Journal of Solid State Circuits, Vol. 29, No. 6, June 1994, by Kawahara et al. However, there is no practical charge recycling scheme for general use in logic circuit design.