1. Field of the Invention
The present invention relates to gated clock drivers and, more particularly, to a distributed gated clock driver having improved clock skew characteristics.
2. Description of the Relevant Art
Gated clock drivers are used in VLSI designs to selectively latch data into flip-flop or latch circuits. Such latches may, for example, be used to implement the registers coupled between stages in pipelined microprocessors. Pipelining involves partitioning a process with "n" steps into "n" hardware stages separated by memory elements called registers which hold intermediate results. There is one pipeline stage for each step in the process and the stages are connected in the same order that the steps are performed. By allowing each of the n stages to operate concurrently, the pipelined process can potentially operate at n times the rate of the non-pipelined process.
The gated clock drivers used to latch the data into the flip-flops may be implemented either as shared control logic across a bank of flip-flops or individually with each flip-flop forming the bank or data path. Since the data path typically requires 32-bits or more, 32 flip-flops or more are required. Accordingly, implementing the gated clock drivers in the data path flip-flops requires more chip area, which is frequently at a premium in high performance microprocessors.
FIGS. 1a and 1b--Flip-Flops with Shared and Distributed Clock Drivers
This is readily seen with respect to FIGS. 1a and 1b. FIG. 1a illustrates an exemplary bank of flip-flops 2a through 2n (collectively referred to as flip-flops or latches 2) sharing a gated clock driver 4. A clock input is provided along line 3 to clock driver 4. An enable input is provided via line 6. The gated clock is then distributed to each of the flip-flops or latches 2.
FIG. 1b, on the other hand, illustrates a bank of n flip-flops 6a through 6n (referred to collectively as flip-flops or latches 6), each having its own gated clock driver 8a through 8n (referred to collectively as clock drivers 8). Providing a clock driver 8 to each flip-flop or latch 6 obviously requires duplication of circuitry and hence occupies considerably more chip space than the implementation of FIG. 1a, though it is noted that the implementation in FIG. 1b is relatively faster than that illustrated in FIG. 1a. Nevertheless, because of the area advantages gained by sharing clock drivers, the implementation of FIG. 1a is often preferred.
FIG. 2 and FIG. 3--Shared Gated Clock Driver and NOR Gate
Turning now to FIG. 2, there is illustrated a gated clock driver as implemented in a shared system 90. Shared system 90 includes a shared clock driver 100 coupled to a plurality of flip-flops or latches (of which only one is illustrated, and referred to as flip-flop or latch 200). Shared clock driver 100 receives a clock signal ICLK and an enable signal EN into a latch circuit 103. The output of the latch circuit 103 is provided to an inverter 104. The inverted latched enable signal, XLEN, and the clock signal ICLK, are input into NOR gate 102.
NOR gate 102 is typically a standard NOR gate of the type well-known in the art and illustrated in FIG. 3. NOR gate 102 includes a pair of p-channel transistors in series 202, 204 and a pair of n-channel transistors 206, 208 in parallel. When both inputs XLEN and ICLK are low, both p-channel devices 202, 204 are conducting, while both n-channel devices 206, 208 are cut off, and the output XGICLK is high. When either or both inputs are high, one or both of the p-channel transistors 202, 204 will turn off and one or both of the n-channel transistors 206, 208 will turn on, leading to a low output at XGICLK.
Turning back to FIG. 2, the output of NOR gate 102 will be high whenever enable signal XLEN is low and the clock signal ICLK is low. The resulting output XGICLK of NOR gate 102 is provided to clock flip-flop 200. As is well-known in the art, flip-flop 200 includes transmission gates 106a and 106b, control inverters 108a and 108b coupled to receive as inputs the ICLK clock signal and the XGICLK clock signal, respectively, and a pair of hold circuits 110a, 110b. When enabled, the input D, for example, will be latched through to output Q of flip-flop 200. It is noted that while a D flip-flop is illustrated, various other kinds of latches and flip-flops are contemplated.
In addition, as noted above, in a typical system, the data path is 32 bits wide. Thus, there may be 32 flip-flops, such as flip-flop 200, to which the XGICLK common clock signal is provided. Providing a common clock driver signal, however, introduces clock skew because the gated clock must be routed across each flip-flop in the data path. Clock skew occurs when two clock signals travel along different paths with different delay times arriving at different latches at different times. While clock skew may be minimized by routing the gated clock signal across a low resistance (wide) interconnect, this method is undesirable because it occupies too much chip area.