1. Field of the Invention
The present invention related generally to systems and methods for distributing a clock signal in an integrated circuit.
2. Description of Background Art
There is increasing interest in high-speed microprocessors, such as microprocessors with a clock cycle frequency greater than one Gigahertz. It is desirable to distribute the clock signal across the microprocessor with a low variance in the latency at different clock distribution points of the chip.
FIG. 1 shows an exemplary clock signal distribution network 100. A common clock distribution network is a clock tree 100 that fans out at each clock buffer 104 to distribute the signal to higher level branches 106 of the clock tree. The clock signal (or its logical complement) is reproduced by each buffer 104. The clock buffers are commonly inverters. The reproduced clock signal and/or its logical complement is, in turn, coupled to other clock buffers at higher levels of the clock tree. Referring to FIG. 1, typically, a master clock signal 102 is generated having a duty cycle of approximately 50%, i.e. the clock pulse is high for about half of the clock period and low for the other of the clock period. The clock signal distribution network may include a variety of clock buffers.
A clock tree for driving a large number of latches must typically have a substantial number of levels due to the limited fan out possible from a single clock buffer 104. The number of levels of the clock tree will depend, in part, upon the fan out possible with each clock buffer and upon how many latches the clock must drive. As an illustrative example, if there are 100,000 latches that need to be driven and each clock buffer 104 has a gain of a little over three, a total of approximately ten to eleven levels are required in the clock tree so that the cumulative fan out is sufficient to drive the latches (i.e., 311 greater than 100,000).
The latency for clock signals to traverse from the original clock source 102 to a distribution point of the clock tree will depend upon the time delay in each clock buffer 104 and upon the number of clock levels that the signal must traverse, i.e. the total number of clock buffers along the path. The performance of clock tree 100 will be affected by the gain per delay characteristics of each of the clock buffer 104 of the clock tree. The gain per delay is a frequently used figure of merit. Generally speaking, a high gain per delay is desirable in a clock buffer stage.
The design of the clock buffer 104 is limited by numerous factors. FIG. 2 shows an exemplary clock buffer stage similar to that described in U.S. Pat. No. 6,024,738 by Masleid, the contents of which are hereby incorporated by reference. The design of clock buffer 201 is limited by the requirement that the clock signal be reproduced at the output of the clock buffer 201. An input 1 receives a clock signal having an approximately 50% duty cycle via a single wire input. A pulse generator stage comprising logic gates 203, 205 and 207 creates two sets of pulses, corresponding to rising edge pulses and falling edge pulses at outputs 2 and 6, respectively. Inverters 209 and 211 amplify the rising edge pulses whereas inverters 215 and 217 amplify the falling edge pulses. The output 4 of inverter 211 and the output 8 of inverter 217 are input to a tristate buffer comprising transistors 213 and 219 to reconstruct the clock signal from the amplified rising edge pulses and amplified falling edge pulses.
The delay associated with each clock buffer 201 is determined by several factors. The delay associated with the amplifying inverters 209, 211, 215, and 217 can be reduced, somewhat, by using skewed amplifiers having a logical threshold selected to favor the propagation of either a falling or rising edge through the inverter. However, in a conventional clock buffer 201 there are limitations imposed on the number of skewed amplifying inverters that can be used as an amplifying chain because of the increase in pulse width associated with the skew of the inverters 209, 211, 215, and 217.
FIG. 3 is a diagram of illustrative signal intensities versus time along selected portions of buffer 201. For the purposes of illustration, the signals are shown relative to a common time axis. Signal plot 301 corresponds to the signal of the clock at point 1, signal plot 302 corresponds to the output 2 of the rising pulse generator, signal plot 303 corresponds to the output of inverter 209, and signal plot 304 corresponds to the output of inverter 211. Signal plot 306 corresponds to the output of pulse generator 207, signal plot 307 corresponds to the output of inverter 215, and signal plot 308 corresponds to the output of inverter 217. Signal plot 305 corresponds to the reconstructed clock signal at clock output 5, which is delayed in time compared to the input clock signal 301 due to capacitive and other effects.
FIG. 3 illustrates how the pulse width changes as signals traverse the skewed inverter buffers. Signal plots 303 and 304 illustrate how the inverters 209 and 211 broaden the rising edge pulses. Similarly, signal plots 307 and 309 illustrate how inverters 215 and 217 broaden the falling edge pulses. The skewed amplifiers 209, 211, 215, and 217 favor the propagation of a leading edge but result in an increase in pulse width. The increase in pulse width in each skewed inverter limits the number of inverter stages and/or the gain per delay. This is because the pulses of signal plots 304 and 308 must be non-overlapping (e.g., have a pulse duty cycle of less than 50%) for the tristate buffer comprised of transistors 213 and 219 to recover the clock signal. Consequently, the design of the amplifying inverters of clock buffer 201 is limited by the requirement of the tristate buffer of transistors 213 and 219 that outputs 304 and 308 be nonoverlapping, i.e. that each have a duty cycle of less than about 50%.
A consequence of the limitations of conventional clock buffer 201 is that the clock buffer may have a smaller gain per delay than desired which, in turn, may result in a conventional clock tree 100 having a larger latency than desired. This is of particular concern in high speed microprocessors operating at a high clock rate. Moreover, the limitations of clock buffer 201 may be expected to become more severe in their effects as clock rates increase and as the number of clock tree levels increases.
What is desired is a clock buffer and clock distribution network having reduced latency.
A clock signal distribution network is disclosed in which the clock signal information is distributed in one or more levels of a clock tree as at least two signals indicative of each instance of a rising edge and a falling edge of the clock. These signals are transmitted in separate wires of a bus and used to recover the clock signal at another location in a clock tree.
In one embodiment, a first signal is a pulse signal that is a first sequence of pulses with one pulse generated for each rising edge of the clock signal and the second signal is a pulse signal that is a second sequence of pulses with one pulse generated for each falling edge of the clock signal. The first and second pulses signals are amplified in first and second skewed amplifiers that favor the propagation of the leading edge of each pulse. Since the clock information is contained in the timing of the leading edges of the first and second signals, the skew of the amplifier can be selected to reduce the delay associated with the amplifiers. The timing information is retained as long as each pulse of the first and second pulse signals has a pulse width less than the clock pulse width. Consequently, in one embodiment the skew characteristics of the first and second skewed amplifiers may be selected such that each pulse of the first and second pulse signals has a pulse width in the range of 5% to 95% of the clock period.
Each pulse signal may be transmitted to another clock distribution point using separate wires of a two wire bus. The clock signal may be recovered by regenerating pulses of the first and second pulse signals using pulse generators configured to generate new pulses responsive to the leading edges of input pulses. The regenerated first and second pulse signals may then be amplified in third and fourth skewed amplifiers and preferably have a pulse width selected so that they may be input to a tristate buffer to recover the clock signal.