As minimum transistor geometries of today's integrated circuits continue to shrink below 100 nanometers, vastly more transistors are available to be placed within a given area on an integrated circuit, or “chip.” This results in increasing power density within the chip. For example, the Intel 80386 microprocessor, circa 1988 contained hundreds of thousands of transistors and consumed tens of Watts of power, mainly due to normal switching operation of the transistors. Today, and by contrast, the Intel Pentium 4 microprocessor contains about 55 million transistors and consumes more than 80 Watts of power. Emerging technologies incorporate over 200 million transistors in a processor. Such large power dissipations are becoming increasingly problematic for many applications, such as battery operated systems (e.g., PDAs, laptop computers, cell phones) and for systems requiring Energy Star compliance (e.g., office laser printers). The nature of many of these low power applications is such that the system should perform computations very fast when called upon to do so but often sits idle with no work to do for extended periods of time. Historically, large complex chips such as microprocessors have only marginally addressed this issue, resulting in little difference in operating power when the microprocessor is idle versus when doing useful work. This is because the clock signals, which drive the storage elements (e.g., registers, latches) and dynamic logic of the chip remain active such that a comparable amount of charge is switched whether the chip is doing useful work or not. Such clock signals are generally created as outputs from a Phased Lock Loop (PLL) circuit. Such PLLs are advantageous for maintaining internal synchronization with an external clock reference circuit, and can perform clock multiplication such that the reference clock frequency is multiplied by the PLL to produce an internal clock signal having a switching frequency that of the reference clock multiplied by some programmable constant (e.g, 7.0, 7.5, 8.0, 8.5).
FIG. 1 depicts a phase-locked-loop (PLL) and clock distribution system according to the prior art. A phase frequency detector 102 provides up and down signals (collectively a comparison signal) to a charge pump 104, loop filter 106 and voltage controller oscillator (VCO) 108. The output signal is delivered through buffer element 110 to a pipe clock driver network 112 that delivers the clock to clocked elements 114 (e.g., registers). A feedback signal 152 is provided to a divider 116 to delay match 118 and back to the phase frequency detector 102, forming the main loop. The design provides a conventional PLL and the creation of clock signals for driving the pipeline at a relatively large-multiplied frequency clock signal and for driving the chip I/O at the same clock frequency as the reference clock, the latter being necessary to synchronize signals occurring at the input and output pins of the integrated circuit to the reference clock input signal. The PLL allows the separately generated clock domain signals to maintain alignment with the input clock reference according to a reference clock edge, rather than suffering delay through the clock input buffer and clock generation circuits, an advantage well appreciated by those skilled in the art. In one particular example of such an arrangement, a global clock network covers much of the chip, driven by a network of clock buffers receiving as input the output clock signal from the PLL. Delay match circuits such as Delay Match A 118 and Delay Match B 120 as shown in the figure are designed as a best effort to phase align (that is, minimize clock skew) among signals System Clock, Feedback Clock, and the clocked elements 114, 126. This arrangement is recognized by those skilled in the art as generally delivering a network of reasonably low-skew destination clocks signals (that is, final stage clock signals that drive registers, latches, dynamic logic, etc.). By reducing clock skew, the maximum frequency can be made larger, since differences in clock signals across the chip do not degrade the departure time from transmitting circuits or the allowable arrival time at receiving circuits. A limitation to the arrangement shown in FIG. 1 is that the Pipe Clock Driver Network 112 is quite large and complex, essentially covering the entire area of the chip. Mismatches in circuit and parasitic loading conditions among the branches in this network are inevitable, leading to substantial regional skew at the final clock signals which drive the clocked elements. This clock skew could limit the maximum operating frequency of the chip and even cause frequency-independent hold time failures if the skew becomes sufficiently large.
FIG. 2 depicts a phase-locked-loop and clock distribution system according to the prior art. In this figure, the charge pump, loop filter and VCO are combined in block 204 for brevity and may be referred to as a clock generator. A synchronizer 218 is placed into the PLL feedback path to match the delay through a synchronizer in the bus interface timing generator 212. Thus, clock signals delivered to the circuit elements 214, 226 are phase aligned with minimal clock skew to input reference system clock. Both clock signals s3clk, and p3clk (the latter derived from the global clock mesh), are used to generate the final feedback input signal to the synchronizer 218. This arrangement reduces overall clock skew throughout the network clocked elements as described in U.S. Pat. No. 6,292,061, Restle, Philip, et. al., A Clock Distribution Network for Microprocessors, IEEE Journal of Solid State Circuits, Vol. 36, #5, May 2001, p. 792, and Rusu, Stefan, The First IA-64 Microprocessor, IEEE Journal of Solid State Circuits, Vol. 35, #11, November 2000, p. 1539. With the arrangement in FIG. 2, it is no longer possible to defeat the pipeline clock network (e.g., for the purpose of achieving a low power mode of operation), without defeating the feedback loop to the Phase Frequency Detector (PFD) of the PLL. Attempts to defeat the clock network up to and including the global clock mesh to approach an ideal condition of minimal power consumption and minimal clock skew will result in unlocking the PLL. Consequently, a synchronized bus frequency clock signal cannot be maintained for the purpose of responding to an interrupting event so as to signal the end of the low power mode. Furthermore, if the low power mode is terminated, the PLL should be relocked before pipeline operation can resume. This relock sequence typically takes thousands to tens-of-thousands of bus clock cycles to accomplish, substantially adding to the exit time latency of the low power mode. For example, a system having a bus clock period of 10 nanoseconds (100 MHz frequency) could take ˜10-100 microseconds to recover from standby mode.
What is needed is a PLL that permits the circuit elements to be put to sleep while also providing fast clock synchronization recovery.