1. Field of the Invention
The present invention relates generally to clock distribution in integrated circuits and more particularly relates to methods of distributing a high+frequency clock with improved power efficiency and skew and jitter performance.
2. Background Description
Clocking large digital chips with a single high-frequency global clock is becoming an increasingly difficult task. As circuit size and clock frequency continue to increase, skew and jitter as well as power consumption are becoming increasingly important design considerations.
While jitter and skew have traditionally been the dominant concerns in clock circuit design, power consumption may soon gain primacy. With each new generation of integrated circuit, clock capacitance and frequency are increasing resulting in significant increases in dynamic power dissipation. Considering that a 72-W 600-MHz Alpha processor dissipates more than half of its power in the clock circuit, this is clearly an area ripe for design optimization.
To date, most of the work in clock distribution has been focused on addressing the issues of skew and jitter. There are two general approaches to clock wiring, trees and grids. Tunable trees consume less wiring and, therefore, represent less capacitance, lower wiring track usage, lower power, and lower latency. Trees must, however, be carefully tuned and this tuning is a very strong function of load. Thus, there is substantial interplay between the clock distribution circuit and the underlying circuit being driven by the clock circuit. Grids, in contrast, can present large capacitance and require significant use of wiring resources, but provide relative load independence by connecting nearby points directly to the grid. This latter property has proven irresistible and most recent global clock distributions in high-end microprocessors utilize some sort of global clock grid. Early grid distributions were driven by a single effective global clock driver positioned at the center of the chip.
Most modern clock distribution circuits use a balanced H-tree to build up and distribute the gain required to drive the grid. The grid drive points are distributed across the entire chip, rather than being concentrated at a single point; this means that the grid can be less dense than a grid that is driven in a less distributed fashion, resulting in less capacitance and less consumption of wiring resources. The shunting properties of the grid help to cancel out skew and jitter from imperfections in the tree distribution, as well as balance out uneven clock loads.
To prevent skew and jitter from accumulating with increased distance from the clock source, there have been several approaches for using multiple on-chip clock sources. One approach is to create a distributed phase-locked loop (PLL) in which there is a single phase-frequency detector, charge pump, and low-pass filter, but multiple voltage-controlled oscillators (VCOs). These oscillators are distributed across the chip to drive a single clock grid. The grid acts to help cancel out across-chip mismatches between the VCOs and limit skew and cycle-to-cycle jitter. The main problem with this approach is the need to distribute a “global” analog voltage across the chip (the VCO control voltage), which can be very susceptible to noise.
An alternative to this approach is to have multiple PLLs across the chip, each driving the clock to only a small section or tile of the integrated circuit. Clock latency from the oscillator is reduced because the clock distribution is local and the clock loads for each PLL is smaller. In such a design, each PLL must average the phases of its neighbors to determine lock and nonlinearities must be introduced into the phase detectors to avoid mode-locked conditions. Any mismatch between the phase detectors adds uncompensated skew to the distribution.
To control clock power, the most common technique employed is that of clock gating, in which logic is introduced into the local clock distribution to “shut off* the clocking of sections of the design when they are not in use. These techniques generally favor relegating more of the clock load to “local” clocking where it can be gated and have been widely employed in low-performance designs in which power is of prominent concern (e.g. digital signal processors for mobile, battery-powered applications). Until recently, clock gating has not been favored as a technique for high-performance design because of the skew and jitter potentially introduced by the clock gating logic and because of delta-I noise concerns (i.e., transients introduced in the power supply distribution when large amounts of switching clock capacitance are turned on and off) As clock power exceeds 80 W, clock gating is beginning to be employed even in these high-performance chips.
The natural limit of clock gating is to approach more asynchronous design techniques, in which blocks are activated only in the presence of data. Globally-asynchronous, locally synchronous (GALS) design preserves the paradigm of synchronous design locally. Asynchronous design techniques, however, are more difficult to design, costlier to implement, more challenging to test, and more difficult to verify and debug. There is clearly a significant desire to continue to use and improve upon globally synchronous designs.
The virtues of LC-type oscillators for achieving lower-power and better phase stability (than oscillators based on delay elements) have been long recognized. The adiabatic logic community has already considered the importance of resonant clock generation since the clocks are used to power the circuits and such resonance is fundamental to the energy recovery. These generators generally produce sinusoidal or near sinusoidal clock waveforms. To combine the clock generation and distribution, distributed LC oscillators in the form of transmission line systems have been considered. These also bear resemblance to distributed oscillators. In salphasic clock distribution, a standing (sinusoidal) wave is established in an unterminated transmission line. As a result, each receiver along the line receives a sine wave of identical phase (but different amplitude). Unfortunately, on-chip transmission lines tend to be very lossy and exhibit low bandwidths for long wire lengths. This produces significant phase error due to the mismatch in amplitude between forward and reverse propagating waves.
Another approach that has been proposed uses a set of coupled transmission line rings as LC tank circuits, pumped by a set of cross-coupled inverters to distribute clock signals. The propagation time around the rings determines the oscillation frequency and different points around the ring have different phases. This approach, however, also has many significant difficulties. Rings must be precisely “tuned” even with potentially varying (lumped) load capacitance producing discontinuities in the transmission line. Fundamentally, the distribution and the resonance determining the clock frequency are fundamentally linked, in which the former may depend on geometry or other constraints inconsistent with the desired resonance frequency.
Another approach to synchronized clock distribution in an integrated circuit is disclosed in U.S. Pat. No. 6,057,724 to Warm. The Warm patent discloses a clock distribution circuit which includes a parallel plate microstrip resonator formed in the integrated circuit which operates as a resonant cavity to generate a clock signal.
Despite the various efforts to provide clock distribution circuits for very large scale integrated circuits, there remains a need for a clock distribution circuit which offers lower power consumption without sacrificing, and preferably improving, skew and jitter performance.