Not applicable.
Not applicable.
1. Field of the Invention
The present invention generally relates to clock generation in an integrated circuit. More particularly, the present invention relates to the minimization of clock skew and jitter in a clock distribution network. Still more particularly, the present invention relates to a clock distribution network design in which clock drivers are designed using silicon-on-insulator techniques to render the drivers relatively insensitive to power supply variations and fluctuations to thereby reduce the effects of skew and jitter in the distributed clock signals.
2. Background of the Invention
One of the critical design elements in modern processor chips and other very large scale integrated circuits is the manner in which the clock signals are distributed within the integrated circuit. Most digital circuits require a clock signal to operate, and data in a digital circuit typically is latched, processed, and output on one or more edges (i.e., the rising edge, the falling edge, or both) of the clock signal. Thus, without a good quality clock signal, most digital circuits will not operate properly, or will operate erratically.
In modem processor designs, and other very large scale integrated circuits, the clock signal may need to be distributed to relatively large areas of the die, because of the layout of the digital circuitry. To enable the clock signal to be effectively transmitted over long distances, it is common to use clock drivers that are distributed throughout the die. Without the clock drivers, the clock signal may attenuate or degrade to such an extent that the receiving digital circuitry cannot operate properly. This problem is compounded as designers reduce the power supply voltage. Thus, more than ever, clock drivers are required to insure that a high quality clock signal is delivered to the digital circuitry in the integrated circuit.
As even the most casual observer is aware, the clock speed of modem digital circuitry has increased at an astonishing rate. It has become commonplace for processors to meet or exceed clock speeds of 1 Gigahertz. Clock speeds have become sufficiently high that the problem of delivering a high quality clock signal to all digital circuits in a large integrated circuit, such as a processor, is becoming increasingly challenging. A processor with a 1GHz clock means that 1 billion clock pulses must be transmitted to each digital circuit device on the die each second. Moreover, to avoid problems with clock skew and jitter, a relatively stable clock signal must arrive at the digital circuits at substantially the same time. If the clock signals do not arrive at each digital circuit at virtually the same time, drastic consequences may result, which could cause the processor to operate improperly or to fail. As an example, most processors include a processor core and an on-chip cache memory. According to normal convention, the processor core saves and retrieves data to the cache memory during normal processor operations. The protocol by which data is read from and written to the cache is precisely set to maximize system efficiency. During a read cycle, for example, the processor core expects that data from the cache will be made available on a predetermined number of clock cycles after the read request. If the cache memory receives the clock signal at a point in time that is delayed relative to the processor core, the cache memory may not have the data available when expected. The processor core may nonetheless interpret the state of the signal lines as the read data, and thus may accept invalid data. Such a result could be catastrophic.
To avoid these and other errors that result from clock skew and clock jitter, clock distribution networks are implemented to ensure some acceptable level of synchronism between the digital circuitry. Typically, a clock distribution tree is provided in the integrated circuit to distribute the clock signal throughout the die. As shown in FIG. 1A, the clock distribution tree distributes one or more clock signals from a common clock generator 10, which is specially placed on the die 5. An example of a portion of one branch of a clock distribution tree is shown in FIG. 1A, for purposes of illustration. As shown in FIG. 1A, a plurality of clock repeaters (or clock drivers) 15 are provided in each branch to regenerate and re-transmit the clock signal to the digital circuitry on the die to ensure that each digital circuit receives high quality clock signals. Five clock drivers are shown in each of the two branches depicted in FIG. 1A, providing clock signals to the upper right die region and the lower right die region. Each clock driver defines another xe2x80x9cstagexe2x80x9d of the clock distribution tree, and each clock distribution stage produces a limited amount of gain to the clock signal. The number of clock distribution stages is dictated by the area covered on the die, and the load (i.e., the number of devices that receive the clock signal on each branch). Each stage of the clock distribution network introduces a risk that a variation will be produced that will result in the clock signals not being synchronized between different distribution branches. To minimize this risk, equidistant signal paths or traces generally are used to connect each of the digital circuits to the clock generator 10. By using signal paths of equal length, the propagation delay is minimized. To further minimize the risk that different distribution branches may have a different propagation delay, each clock driver 15 is identically constructed, and drivers are located uniformly in the branches.
According to conventional techniques, the clock drivers 15 are implemented using inverters, which comprise a relatively simple circuit design. An example of a standard clock inverter used in digital circuit design is shown in FIG. 2A. As shown in FIG. 2A, the conventional inverter comprises a pFET (p-junction field effect transistor) and an nFET (n-junction field effect transistor) with their gates tied to a common clock input terminal and their drains tied to a common clock output terminal. The Source terminal of the pFET connects to the voltage power supply VDD, while the Source of the nFET connects to VSS. When a low voltage (a binary xe2x80x9c0xe2x80x9d) appears at the clock input terminal, the nFET is non-conducting, while the pFET conducts the voltage power supply VDD at the Source terminal to the Drain terminal, which produces a high voltage (a binary xe2x80x9c1xe2x80x9d) at the clock output terminal. Conversely, when the input clock terminal is at a high voltage (a binary xe2x80x9c1xe2x80x9d), the pFET is non-conducting, and the nFET conducts, causing the low voltage VSS (a binary xe2x80x9c0xe2x80x9d) to appear at the clock output terminal.
Despite the precautions taken in designing clock distribution networks, propagation delays still occur among different clock paths. These propagation delays result from several factors. One of the primary factors that cause this propagation delay is local variation in the power supply voltage. These variations in power supply voltage occur due to the load experienced within a particular region. Thus, the region of the CPU core may be drawing more power than another area, such as the cache memory. This may cause the CPU core region, for example, to experience a reduction in the power supply voltage by a significant amount (which could differ by as much as 15-25% across different die regions). Generally, the higher the power supply voltage, the faster the signal will propagate through the clock driver (or inverter). Thus, these voltage fluctuations produce non-uniform propagation delays that result from heavier circuit operation in a particular region of the die. Other factors, which also can cause propagation delays, are temperature gradients, process variations, and the like.
Some attempts have been made to mitigate the propagation delay caused by these environmental and process factors. One technique that has been used is to tie together the inputs and outputs of some of the clock drivers, so that one or more stage of clock drivers is driven by two different clock paths, as shown for example in FIG. 1B. As FIG. 1B illustrates, clock drivers 22, 24, 26, 28 are driven both by clock driver 20 and 25 (which may be designed to produce 50% of the gain of a conventional single clock driver). Branches A and B therefore both provide a clock input signal to branches C, D, E and F. In this design, the clock drivers 22, 24, 26, 28 that are driven by multiple drivers will experience a propagation delay which is some proportional average of the propagation delay experienced on branch A and branch B, thus minimizing skew effects. Thus, for example, if branch A has less of a propagation delay than branch B, the clock signal from branch A will arrive at each of the inverters 22, 24, 26, 28, and begin charging these clock drivers. When the clock signal arrives from branch B, this charging process is accelerated, and eventually each of the inverters 22, 24, 26, 28 generate an output signal. The result is that each of branches C, D, E, and F have a propagation delay that is some proportional average of the propagation delay of branches A and B.
While the approach depicted in FIG. 1B has some effectiveness in compensating for voltage gradients, it also has some serious limitations. One limitation is that the averaging technique only is effective in minimizing the propagation delay variations between branches that can be tied together. If branches in other regions of the die are not tied to common clock drivers, then the different branches will still experience clock skew. Thus, the technique of FIG. 1B works best if the clock drivers are relatively close together, so that the inputs and outputs of the drivers can be connected together. In addition, while this approach reduces skew between different locations in the die, it does not reduce clock jitter, which relates to non-uniformity or instability of the phase and/or amplitude of the clock signal. Clock jitter may be caused by several factors, including temperature fluctuations within a particular region or zone on the die. These temperature fluctuations can cause the phase and/or amplitude of the clock signal to vary even within the same branch. Variances in phase and amplitude can affect the ability of the digital circuitry to perform necessary processes within a prescribed number of clock cycles. The approach illustrated in FIG. 1B, therefore, does not address clocking errors introduced by clock jitter, but instead focuses on averaging the time delays (or skew) on different clock branches.
As processors and other circuit designs become increasingly fast, it becomes desirable to design a clock driver that is relatively immune to the effects of voltage variations and fluctuations in the die to minimize both clock skew and jitter. Thus, it would be advantageous if a clock driver were developed that exhibited a more uniform operation in the face of power supply voltage fluctuations. Despite these apparent advantages, to date no one has developed a clocking distribution network that solves the problem of both clock skew and clock jitter.
The present invention solves the deficiencies of the prior art by implementing a clock driver using silicon-on-insulator technology, and tying together the bodies of the nFET and pFET gates. By tying together the nFET and pFET bodies of the clock driver, the voltage of the nFET body is raised and the voltage of the pFET body is lowered. The net result is that the threshold voltage for both the nFET and pFET is decreased, thereby minimizing the propagation delay of each clock driver attributable to power supply voltage fluctuations and variations.
Silicon-on-insulator technology provides a vehicle to modify the threshold voltage of the nFET and pFET gates in the clock driver. Because the pFET body voltage is relatively high, while the nFET body voltage is relatively low, electrically coupling the nFET and pFET bodies produces an intermediate body voltage for both nFET and pFET gates. The body voltage of the FET relates to the threshold voltage of the gate. Increasing the body voltage of the nFET decreases the threshold voltage level of the nFET, thereby making it a faster device that experiences less propagation delay. Conversely, decreasing the body voltage of the pFET increases the threshold voltage level, thereby minimizing the voltage differential between the power supply voltage, VDD, and the pFET threshold voltage. By minimizing this voltage differential, the pFET becomes a faster device, and also experiences less propagation delay.
According to another embodiment of the present invention, the bodies of an nFET and a pFET in an inverter are electrically coupled, thereby lowering the body voltage of the pFET, while raising the body voltage of the nFET to a common voltage level. By raising the body voltage of the nFET, the threshold voltage of the nFET is reduced, thus minimizing the propagation delay in the nFET. Similarly, lowering the body voltage of the pFET causes the threshold voltage of the pFET to rise, thereby minimizing the voltage differential between VDD and the threshold voltage. Minimizing the threshold voltage differential of the pFET minimizes the propagation delay in the nFET. The net effect is a reduction in propagation delay of the inverter.
In another embodiment, the body of an nFET may be coupled to the body of a pFET through one or more voltage drop transistors. The voltage drop transistor(s) may be either nFETs or pFETs. The use of the voltage drop transistor(s) serves to displace the body voltage of the primary nFET and pFET, thereby reducing leakage through these transistors, while at the same time lowering the body voltage of the pFET, and raising the body voltage of the nFET.
The present invention may also be used to make a programmable clock driver, which can be selected to operate at a high speed with a reduction in propagation delay. Alternatively, the programmable clock driver may be selected to operate at a lower speed, with less leakage through the inverter gates, thereby reducing power consumption. To operate at high speed, with an attendant reduction in propagation delay, the circuit is configured to connect the body of the inverting pFET and nFET together to tie the body voltages of these two gates to each other.
To operate with a reduction in power, the pFET body may be connected to the power supply voltage, thus increasing the pFET body voltage. This in turn causes the threshold voltage differential for the pFET gate to increase, which reduces leakage through the pFET gate. Similarly, the nFET body may be connected to VSS, thereby dropping the body voltage of the nFET. This reduction in the body voltage causes the threshold voltage of the nFET to increase, which reduces leakage through the nFET. The benefit of this gate is that it can operate with less power than a conventional gate. Thus, the present invention may also be used to increase the threshold voltage differential, to minimize leakage current in low-power devices, such as notebook computers and personal device assistants (PDAs).
These and other aspects of the present invention will become apparent upon analyzing the drawings, detailed description and claims, which follow.