1. Field of the Invention
The present invention relates to semiconductor devices. More specifically, the present invention relates to the synchronization of logic within a semiconductor device.
2. Description of the Related Art
Clocking in Digital Logic
The Purpose of Clocks
Clocks are periodic signals used for timing and synchronization purposes in synchronous digital logic. Clocks define periods of time in which logic operations are performed by circuits. Logic operations involve the propagation of state through a series of logic gates.
In synchronous circuits logic state propagation is launched or initiated by a source clock edge. After propagating through paths of logic gates, the resulting logic state is sampled by a destination clock edge. The destination clock edge is generated from a clock event that follows the clock event that generated the source clock edge.
Since propagation of state through paths of gates takes time, for some period of time after the source clock edge, logic paths will contain state that is new (or valid for this cycle) and state that is old (invalid for this cycle). Generally, at the end of a period of time (often defined as a clock cycle), valid state has propagated through the entire path or collection of paths and there is no longer any invalid state in the circuit. The following clock edge starts the process anew.
Logical operations as implemented in electronic circuits propagate through paths of logic gates that diverge and converge. When logic paths converge or are combined with other paths they must do so at a similar point in time--this is the time at which the various convergent paths all have valid data. This point of time is determined by the arrival time of the latest arriving data. There are generally other, faster paths converging on this point that must hold their valid data until it has been successfully combined with the late-arriving data. In most digital circuits, the clock or clocks provide this synchronization function. Thus, clocks can be thought of as performing a regulating or governing function--they slow down or hold faster paths until the slower paths have become valid.
Generally speaking, it is required that logic circuits work as quickly as possible. It is therefore highly desirable that the clocks perform their regulating function while imposing as little penalty as possible on the operating speed of the circuit.
Clock Skew
Clock skew is a component of timing error that can both interfere with the regulating function of the clocks and reduce the maximum operating speed of the circuit. The definition of clock skew is that it is the difference in arrival times among clock edges that are derived from the same clock event but are associated with physically distinct clock nodes.
For example, a master clock is commonly distributed by some means to a large number of destinations. The distribution means may be as simple as a network of wires or may include many levels of active buffers. FIG. 1 illustrates a clock system with a single clock source (typically a phase locked loop, a PLL, or a digital delay loop, a DLL) followed by some number of generators. The generators reshape the single clock source into multiple clocks. The reshaping that occurs in a generator can be either a straightforward delay of the source clock, it can be an inversion of the source clock, or it can be a change of the shape of the clock waveform (e.g., change in duty cycle, change in slew rate, etc.) Or it can also be any combination of the previous transformations. In any case, the propagation time of a clock edge through this distribution path requires some non-zero time. The propagation time to each destination can be tuned by design to be smaller or larger according to the needs of the design. In practice it is expensive (in terms of design effort) to analyze or model the clock distribution circuit so as to predict actual clock skew with total accuracy.
Actually, even with perfect design knowledge it is impossible to control skew with total accuracy because of normal manufacturing variations across a circuit. For example, a certain clock distribution wire may be somewhat more resistive in part of the circuit due to localized variations in interconnect thickness or width. This could result in a consistently longer delay to the clock destination at the end of this wire relative to other clock destinations on a particular die.
It can be seen then that clock skew has both predictable and unpredictable components. With some degree of difficulty, the designer can adjust or control clock skew within certain limits. In practice, this control is limited by the available design time and also by normal manufacturing or environmental variations. As a result of the difficulty in perfectly controlling clock skew and because of the detrimental effects of clock skew, it is important that a design be tolerant of some uncertainty in clock skew among the various clock destinations.
Clock Jitter
Like clock skew, clock jitter is a component of timing error that can adversely affect the regulating function of the clock and also the operating speed of the circuit. Clock jitter is defined to be the error or variation in arrival time of a clock event on a single clock node. This error or variation is relative to an ideal or intended arrival time, usually specified with respect to an immediately prior clock event. Thus, while clock skew describes arrival times of the same event at physically separate locations, clock jitter describes arrival times of different events at the same physical location. Clock jitter may be somewhat different at each clock node.
Clock jitter is rarely if ever intentionally introduced into a clock network (one exception is intentional frequency modulation of the clock). Jitter can be caused by several factors. Jitter may be present on the input clock of the circuit. This generally is passed along through the distribution network. Additionally, it may be introduced by part of the clock generation logic such as a PLL. For example, FIG. 4 illustrates the introduction of jitter by the clock generation logic, which shows a feedback-base control system (a typical PLL) coupling to a transfer function of Z(s) (the clock generator). The PLL contains a steady state phase error that affects edge placement. Noise injected into the system at various points will cause transient responses in the system. Noise can occur in the reference signal .theta.(t), the phase comparator, the loop filter, the voltage controlled oscillator (VCO), the clock generator (Z(s)), or on any of the wires connecting the components. For these components, the primary source of noise is the voltage sources (power and ground), and for the wires, it is coupling noise.
Clock jitter can also be caused by power supply noise and by inductive or capacitive signal coupling. The effect of jitter is to shorten or lengthen clock periods as perceived by certain parts of the circuit. For example, if a certain clock edge is delayed from the arrival time predicted by the prior edge, the ending clock period is lengthened while the following clock period is likely shortened.
Clock jitter that varies among various clock destinations can also increase clock skew. For example, local supply noise may cause a clock edge to arrive early in one location while the same clock edge may arrive on time at another location.
Clock jitter may be short term, causing a cycle to cycle variation in the clock period, or may be longer term, affecting a series of sequential cycles in a similar way. Jitter may also cause the duty cycle of a clock to vary from its intended value. There are usually both short and long term components of jitter present in a clock.
Setup and Hold Hazards
Setup and hold time hazards could exist even with no clock skew or jitter, but skew (especially unpredictable skew) and jitter generally increase the likelihood and severity of these hazards. A setup hazard occurs when a clock edge is sampling data that is arriving very late relative to the clock. If the data is too late or the sampling clock is early relative to its intended arrival time then invalid data is sampled and the circuit operates incorrectly. These hazards are also called slow path or critical path hazards because they are associated with the paths in the design with the longest propagation delay.
Since logic paths are initiated with a source clock and terminate by being sampled with a destination clock, setup hazards are affected by delays between source clocks and destination clocks as well as by logic path delays. Setup hazards are reduced in severity or even eliminated by slowing down the clock frequency. Thus, setup hazards limit high frequency performance of a digital circuit but do not prevent correct operation at a lower frequency.
A hold hazard, in contrast, occurs when a clock edge incorrectly samples data that has been valid but becomes invalid again before the sampling period has completed. Hold time problems are also called fast path problems.
As stated earlier, a destination clock edge is normally generated from a clock event that follows the clock event that generated the source clock edge. Hold time violations occur when data is sampled by a destination clock edge that is actually derived from the same clock edge that generated the source clock. This can occur when data propagates too quickly from source to destination or the destination clock edge occurs too late relative to the source edge that was generated from the same root clock event. It is important to note that because hold time hazards are related to timing between clock edges that are derived from the same event, hold time hazards are not alleviated by changing the clock frequency. Hold time violations prevent the circuit from operating at any frequency. For this reason they are a more severe failure than setup time problems. Hold time problems can be fixed by inserting additional delay into the fast data path, by delaying the source clock, by speeding up the arrival time of the destination clock or by some combination of these methods.
The Cost of Synchronization in Static Logic
The portion of the cycle associated with logic synchronization is called synchronization overhead and represents time spent doing no computational work. Obviously for high performance design, one would like to maximize the amount of work one can accomplish in a given time. Any time spent on synchronization overhead detracts from this goal by reducing the computational efficiency of a design.
FIG. 2 illustrates a simple logic path that comprises two logic paths. One path starts at flip-flop 1 and ends at flip-flop 2. The other starts at flip-flop 2 and returns to flip-flop 1. If there is no unpredictable skew or jitter in the clocks, then the cycle time determined by the round trip delay through these two paths is given by EQU T.tbd.(O.sub.1 +D+S.sub.2 +O.sub.2 +D'+S.sub.1)/2 (1)
where
O.sub.i =output delay of flip-flop i, PA1 O.sub.1 and O.sub.2 =output delay of flip-flops 1 and 2, respectively PA1 S.sub.1 and S.sub.2 =setup times for flip-flops 1 and 2, respectively PA1 S.sub.i =setup time for flip-flop i, and PA1 D, D' are the delays through the logic paths PA1 K.sub.u =the uncertain portion of the clock skew. PA1 J=the clock jitter. PA1 The causes of unpredictable clock skew are independent of the frequency of the clock. This is also true of the flip-flop delays (O.sub.1, S.sub.1), and generally true of the clock jitter. This presents a real problem for high-frequency design because as one designs for higher and higher frequency in a given technology, the percentage of the clock dedicated to synchronization overhead increases. At some point, this overhead becomes dominant, and the benefits of higher frequency design are overwhelmed by the decreased computational efficiency of the logic circuits. Clearly for very high frequency design, a designer needs to explore methods that minimize synchronization overhead. PA1 D.sub.1 and D.sub.2 =delay through latches 1 and 2, respectively PA1 D and D'=delays through the logic paths PA1 J=the clock jitter. PA1 D.sub.i =delay through latch l, PA1 K.sub.u =the uncertain portion of the clock skew, PA1 J=the clock jitter. PA1 Di=delay through latch l, PA1 J=the clock jitter PA1 J=the clock jitter.
FIG. 3 illustrates a typical clock signal with a 50% duty cycle. The clock period, T, is measured from the mid-point of the rise of the clock signal to the midpoint of the next rise of the clock signal. Skew is represented as a shaded area around the rising and falling edges of the clock. Note that the first rising edge is taken as a reference point, so it has no skew. It also important to note that this picture is representative of the situation, but that in reality we are talking about skew between unique points in the clock network. Relating FIG. 2 and FIG. 3, the first rising edge of the clock (of FIG. 3) is measured at the clock input of flip-flop 1 (of FIG. 2), while the next rising edge of the clock is measured at the clock input of flip-flop 2.
The predictable clock skew is fairly straightforward to deal with and can even be used to advantage in some cases. For example, if one knows that flip-flop 2 gets a slightly later version of the clock than flip-flop 1, then the designer can actually allow the data to arrive slightly later. Note that in this case, flip-flop 1 gets a rising edge early relative to flip-flop 2, so in the next cycle, there is less time to get through the logic gates. Flip-flop 1 has no skew relative to itself, so the data must get through the entire loop path in two cycles. If all skew were predictable, then Equation (1) would describe the minimum cycle time; the two path delays are averaged in this case.
Unpredictable clock skew produces a different problem. If a designer does not know what the skew is between flip-flop 1 and 2 is, then the designer must assume the worst, i.e., on the first cycle, the assumption is that flip-flop 2 is early compared to flip-flop 1 and on the second cycle the assumption is the opposite. This way, no matter what case is true, the data arrives in time. Unfortunately, this means that for this portion of the skew, a designer cannot take advantage of a late clock as above. Equation (2) describes this two-cycle path as the following: EQU T.tbd.(O.sub.1 +D+S.sub.2 +O.sub.2 +D'+S.sub.1 +2K.sub.u)/2 (2)
where
If the effect of clock jitter is also added to the delay equation, we have: EQU T.tbd.(O.sub.1 +D+S.sub.2 +O.sub.2 +D'+S.sub.1 +2K.sub.u +2J)/2 (3)
where
One design method for minimizing synchronization overhead involves splitting apart the two latches that form a flip-flop and placing logic between the two latches. The designer then times the logic such that the latches are transparent when the evaluate edge(of the clock signal) of the slowest logic path arrives at their inputs. Now, as clock skew results in the clock edges controlling the latches moving around in time, the slowest path is unaffected (assuming that clock skew is not too large). This is a skew-tolerant design. The clock skew in this type of design can be as large as the time between the ideal clock edge time, and the time where the evaluate edge of the slowest path arrives at the input of the latch. If the designer times it so that the evaluate edge arrives at the middle of the positive clock pulse for each latch (the time when the clock is high), then the design can tolerate a full quarter cycle of skew. Another benefit of this scheme is that the evaluate edge can be a little bit off from this point (assuming the skew is less than the quarter cycle) without penalty. A skew tolerant design therefore removes the skew penalty and the output and setup delay through the latches from the cycle time. This design adds, however, a propagation delay through the latches, which changes the equation for the cycle time to the following:
T.tbd.(D.sub.1 +D.sub.2 +D'+2J)/2 (4)
where
The above design style does not eliminate the effect of long term clock jitter, which is to shorten the clock cycle time. While the delay penalty of the latch propagation time is present, this penalty is generally less than the penalties incurred with flip-flops.
Synchronization in Clocked Precharge Logic
Clocked precharge (CP) logic is a design style that often has a speed advantage over static CMOS logic, and can additionally provide an advantage in overcoming synchronization penalty. Unlike static gates, CP gates have inherent synchronous characteristics. A CP gate has two principal phases of operation: precharge and evaluate. Since it can only switch once during the evaluate phase, it can be thought of as holding its value until the start of the precharge phase. More importantly, a CP gate cannot switch until the start of its evaluate phase (unlike static CMOS gates that may switch whenever their inputs change).
CP gates are connected and clocked in such a way that the first gate in a series evaluates, causing the next gate to evaluate and so on until all gates in the path have evaluated. When the clock to these gates (call this clock PH1) switches to its precharge state, these gates precharge and lose their state. It is therefore necessary to store the result of the computation prior to precharging the gates. This is commonly done by latching the output values of the final gates in the string at the end of the evaluation period (i.e., with a clock similar to the CP gates' PH1 clock), before their precharge begins. This structure is then similar to the arrangements of static logic gates in a latch-based design style. It is common for this latch or set of latches to provide inputs to another series of CP gates that are clocked by a clock that is the inverse of the clock of the first set of CP gates (call this clock PH2). In this way, the second set of CP gates are precharged while the first set is evaluating and while the latch between the two sets of CP gates is transparent. When the first set of gates are precharged (when PH1 is low), the latch holds its state and the second set of gates, sensing the latch output(s), begin evaluating. Similar to the first set of CP gates, the results from the second set of CP gates must be latched during their evaluation phase (when PH2 is high). The output of the second type of latch (PH2 latch) can then drive more CP gates of the first type (PH1 CP gates). This logic and synchronization style is known as skew-intolerant CP logic because it is sensitive to the skew and jitter of the clock edges. FIG. 5 illustrates this type of logic.
Cost of Synchronization in Skew-intolerant CP Logic
Skew-intolerant CP logic suffers from the previously mentioned clocking penalties. The CP gates can only evaluate during the time that the clock is high, but must finish evaluating by the time the clock switches from high to low so that the result can be stored in the latch at the end of phase. This style is affected by the unpredictable skew on both edges of the clock since the computation result must be set up to the latch in time to be sampled. If the data is late or the latch clock is early, incorrect results are sampled. Thus, referring again to FIG. 5, the following equation describes the cycle time: EQU T.tbd.D.sub.0 +D+D.sub.1 +D'+2K.sub.u +2J (5)
where
This synchronization scheme suffers from problems very similar to the problems encountered in flip-flop based static logic design.
Logic synchronization is the process of controlling the timing of all of the logic signals in a system. The present invention is a method and apparatus that describes a synchronization mechanism that tolerates skew and jitter as much as possible in order to lower the minimum operating cycle time for a logic device. A synchronization mechanism is best understood in the context of a logic family, however, and the logic family used to illustrate the present invention is the N-nary logic family described in copending patent application, U.S. Patent application Ser. No. 09/019355, filed Feb. 5, 1998, now U.S. Pat. No. 6,066,965, titled "Method and Apparatus for a N-Nary logic circuit using 1 of 4 signals." Briefly, the logic gates in this family can be thought of as non-inverting clocked precharge circuits that precharge when the clock input signal is low, and evaluate when the clock input signal is high. FIG. 10A illustrates a 1 of 4 logic circuit that is typical of the N-nary logic family.
An efficient processor design operates logic gates at their maximum speed, where the speed of a gate is the sum of its logic propagation time and its node restore time. Static logic gates "restore" when the gates encounter new input values. Dynamic gates, on the other hand, require an explicit precharge operation to prepare for the next set of inputs. A logic gate is operating at its duty-cycle limit when there is no time when the output is not either transitioning to an evaluate level or to a precharge level. FIGS. 9A and 9B illustrate this concept where t.sub.e is the evaluation time, t.sub.p is the precharge time, and t.sub.so represents a stable output.
A given dynamic gate has one or more inputs and one output of interest. When in the evaluate phase, the output of a dynamic gate responds to the input. When in the precharge phase, the output of a dynamic gate returns to a restored level. Note that FIGS. 9A and 9B show the gate (output) transitioning at every evaluate period. This is not the case with traditional dynamic gates, which will only transition when the gate evaluates "true." N-nary logic, however, comprises a plurality of wires where one and only one wire transitions at every evaluation. In some cases, zero wires may evaluate, thus the output may not transition. Therefore, when viewed in terms of signals in N-Nary logic, FIGS. 9A and 9B are representative of the output signal of N-Nary, which is the equivalent of the OR of the output wires as illustrated in FIG. 10B.
FIG. 9B shows the desired operating mode of a dynamic gate. There is little or no time when the output signal is stable since once the output signal is read (as it completes its transition) the gate begins restoring. And, once the gate finishes its restoration, it begins transitioning again. Under these conditions, we know that the logic gate is delivering as many logical operations in a given period of time as the gate is capable of delivering.
FIG. 9A shows, however, a more realistic application of dynamic gates as is typical in prior art systems. As one can see, a substantial amount of additional time is necessary both after the evaluation of the logic gate and after the precharge phase of the logic gate. There are many techniques found in the prior art that make tradeoffs by focusing on the evaluation and precharge periods. Unfortunately, there is nothing in the prior art that focuses directly on the efficiency of a gate. By focusing on improving the gate efficiency, the present invention produces a better set of guidelines for creating a processor with maximum performance, and additionally, develops an alternate clocking strategy derived directly from the nature of the technology.
Some logic gates are faster than others. Typically, the slowest gates are the concern for the designer, while the designer can often ignore the faster gates. Gate speed is more of an issue for dynamic logic because the clocking required of dynamic gates restricts the position within the clock cycle where the gates can perform their desired function. Static logic, on the other hand, performs its function at all times. Whenever an input arrives, a static gate switches accordingly. Nonetheless, an efficient clocking strategy should tolerate dynamic logic gates performing their function in as wide a time period as is possible.
U.S. Pat. No. 5,517,136 to Harris et al. and titled "Opportunistic Time-Borrowing Domino Logic," is an attempt at an efficient clocking strategy. A feature of this patent is that it provides some degree of time borrowing between certain clock domains. The objective of the Harris patent is to eliminate the need for output storing latches at the end of each half of the clock cycle, which by its nature allows some degree of time-stealing, or what this patent calls "opportunistic time borrowing." The non-symmetric nature of the timing or synchronization of the clocks in the Harris patent, however, limits the locations within the clock cycle where borrowing of time can actually occur. In fact, because borrowing cannot occur at some points within every path, the performance of the clocking strategy must be affected by clock uncertainties.
Harris extends the above clocking scheme to a more generalized approach for multi-phase clock systems in a follow up paper to the patent, Harris, D., and Horowitz, M., Skew-Tolerant Domino Circuits, IEEE Journal of Solid-State Circuits, Vol. 32, No. 11, pp. 1702-1711 (November 1997). In addition to extending the Harris patent to a more generalized approach for multi-phase clock systems, the Harris paper attempts to encompass tolerance for clock skew within the clocking scheme. Unfortunately, this paper does not differentiate between predictable and unpredictable clock errors. Additionally, this paper does not appreciate the impact that clock jitter, in addition to skew, has on a clocking scheme, it does not develop a metric for gate efficiency to guide practical designs, and it argues against clocking strategies similar to what is disclosed in this disclosure.
Another prior art patent, U.S. Pat. No. 5,434,520 to Yetter et al and titled "Clocking Systems and Methods for Pipelined Self-Timed Dynamic Logic Circuits" is another attempt at optimizing the clocking of a system by focusing on improving the evaluation and precharge periods. This patent, like the above Harris patent and Harris paper, implements an awkward and inefficient clocking system where only portions of the inefficiencies in traditional dynamic logic families are improved.
Overlapping Clocks Using Stretched Clocks
FIG. 6 illustrates one technique to accomplish logic synchronization, which is by `stretching out` the clock cycle. As previously mentioned, there are numerous examples of stretched clocks in the prior art including the Harris patent, the Yetter patent, and the Harris paper. One sees that the latches are shown in the period of time when both clocks are high, so there is a period of time when a latch is transparent, and CP gates on either side of it are in evaluation mode. This means that within the overlap window, the evaluation edge can pass through the latch and immediately continue through gates on the other side. Assuming the latch is placed in the time when both clocks are undeniably high (i.e., not in the shaded skew area shown in the figure), then much like the transparent latch design style, the skew is not a problem. Equation 6 shows a relationship for cycle time if one uses stretched clocks in this manner. EQU T.tbd.(D.sub.0 +D+D.sub.1 +D'+J) (6)
where
Another advantage a designer can get with stretched clocks is in the latch delays. The reason for the latches being in the path was to hold the result of a phase of logic during the transition from one phase to the next. With the overlapped clocks, it is possible to have logic feeding from a gate in one phase to a gate in the next phase during the time they are both in evaluate mode. This means that the latches are superfluous. The only requirement is that the earlier gate not precharge before its value has propagated through the later gate. FIG. 7 shows a path implemented with this scheme and Equation 7 describes the cycle time of the clock as follows: EQU T.tbd.(D+D'+J) (7)
where
Problems With Stretched Clocks
There are some problems with this synchronization scheme however. A greater than 50% duty cycle on the clocks poses many of the same physical difficulties not described in this disclosure. Additionally, the hold time problems are aggravated. A system with stretched clocks will now cause a fast path to have a hold-time problem even without considering skew and jitter. Hold-time problems require additional design work to tune fast paths. In typical designs, there are a few critical paths (potential setup-time problems) that need careful tuning, while there are potentially many fast paths. The work done in tuning clocks has a reward in that the performance of the logic chip improves as one tunes these paths. There are also fast paths (potential hold-time problems) that the designer now must tune as well. Tuning these paths generally means inserting delay (increased area), or `fiddling` with clocks (prone to error and requiring a great deal of analysis). While it is necessary to fix hold-time problems in order to have a functional chip (at any frequency), there is no performance benefit for doing so.
There are a variety of ways to synchronize the logic circuits within a pipeline on an integrated circuit. For example, FIG. 16 illustrates a typical 4 clock system used in the Harris patent and the Yetter patent. This type of clocking system usually involves a master clock, CLK.sub.1, and its inverse, CLK.sub.3. The other two clocks, CLK.sub.2 and CLK.sub.4, are clocks with stretched clock cycles that may be coincident with the master clock or its inverse. For example, the leading edge of CLK.sub.2 is coincident with the leading edge of CLK.sub.1, and the leading edge of CLK.sub.4 is coincident with the leading edge of CLK.sub.3. Each full cycle of the clock signal has two parts, an even half cycle, t.sub.x, and an odd half cycle, t.sub.y. Each full cycle of the clock signal also comprises a precharge period, t.sub.p, and an evaluate period, t.sub.e. A common feature of this type of clocking system is its evaluation window 220, which has some overlapping phases but only due to the clocks with stretched clock cycles.
FIG. 13 illustrates a typical dynamic logic circuit as described in the Yetter patent, which this patent calls a "mousetrap" logic circuit. This circuit comprises a logic circuit 24 that performs some type of logic evaluation on the two input signals 26 and 28 to produce an output signal 32. Coupled to the logic circuit is an output buffering device, which here is the inverter 30. Additionally, coupled to the logic tree circuit is the precharge device 22 that uses a clock signal CK to determine the time period for recharging the dynamic node of the logic circuit. One disadvantage to this type of dynamic logic circuit is the difficulty in using this type of circuit in pipelining. Another disadvantage is that the clocks cannot be stopped without losing information. The clocking synchronization of the present invention overcomes these disadvantages by using multiple clock domains with overlapping phases.
FIG. 14A and FIG. 14B illustrate the output buffering devices in the Harris patent (U.S. Pat. No. 5,517,136). FIG. 14A depicts Harris's FIG. 1 and FIG. 14B Harris's FIG. 2. The circuit of FIG. 14A uses an output buffer that is similar to the half signal keeper of the present invention. When the output of the inverter is low this transistor holds the input high, making the gate stable. When the output is low, however, the input node can float when the inputs to the gate are removed. The circuit of FIG. 14B uses an output buffer that is similar to the full signal keeper of the present invention that includes an N-channel transistor specifically for the purpose of holding the output low when the input did not discharge.
FIG. 14A consists of a logic circuit 41 that further consists of the input signals A and B. The input signal A connects to NFET 44, and the input signal B connects to NFET 42. NFET 40 is the evaluate device for this circuit, and PFET 46 is the precharge device. Both the evaluate device and the precharge device connect to the clock signal CLK. This circuit also contains an output buffering device that consists of inverter 50 and PFET 48. The output of logic circuit 41 connects to the inverter 54, which Harris denotes as a high skew device. Output 56 connects to the next logic circuit that could be, for example, the next circuit in a pipeline. Harris calls this type of logic circuit with its output buffering device a D1 type gate.
FIG. 14B consists of a logic circuit 61 that further consists of the input signals A and B. The input signal A connects to NFET 64, and the input signal B connects to NFET 62. NFET 60 is the evaluate device for this circuit, and PFET 66 is the precharge device. Both the evaluate device and the precharge device connect to the clock signal CLK. This circuit also contains an output buffering device that consists of the inverters 68 and 70. The output of logic circuit 61 connects to the inverter 72, which Harris denotes as a high skew device. Output 76 connects to the next logic circuit that could be, for example, the next circuit in a pipeline. Harris calls this type of logic circuit with its output buffering device a D1K type gate.
An optimal clocking implementation allows enough borrowing of time from one dynamic gate to the next to account for the differences in gate speed between simple and complex gates, between gates with small and large output loads and differences in speed due to manufacturing variations, and it does so at all points in all paths. The present invention overcomes the above problems in the prior art by implementing a very flexible logic synchronization method and apparatus that uses multiple clocks with overlapping phases.