1. Field of the Invention
This invention relates generally to clock distribution network planning for ASICs, and in particular, to methods and computer-aided design tools for planning the clock distribution network in the conceptual design phase of the ASIC devices to reduce clock skew, ground bounce, VDD noise and idle clock cycle time.
2. Description of the Related Art
The routing and distribution of the clock to elements of an integrated circuit, or more specifically, an application specific integrated circuit (ASIC), is an important factor to consider in the design of ASICs. To take an analogy, for instance, the clock of an ASIC may be seen as the heart and blood flow of a human body, whereas the clock routing and distribution of an ASIC may be seen as the arteries and veins of a human body. Just like the human body requires that the arteries and veins be properly distributed in order for each organ to function properly and together with other organs, the clock routing and distribution of an ASIC should be designed so that the clock-receiving elements function properly and together so that the intended functions of the ASIC are achieved. One of the intended functions of clock recipient elements of an ASIC, for example, is to propagate data to an intended functional destination of an ASIC.
Referring to FIG. 1, a simple prior art chain of D-type flip-flops 20 is shown, wherein the D flip-flops (D0-Dn) are shown sequentially cascaded together to propagate data from a data input to a data output. As it is conventionally known of these types of data-propagating chains of D-type flip-flops, the Q-output of one of the flip-flops in the chain is coupled to the D-input, of the next flip-flop in the chain in the direction in which data propagates. Thus, in the example shown in FIG. 1, the Q-output of flip-flop D0 is coupled to the D-input of flip-flop D1; the Q-output of flip-flop D1 is coupled to the D-input of flip-flop D2; and so on. The chain of D flip-flops are driven by a common clock source 22 by way of a clock distribution network 24. In the example shown in FIG. 1, the clock distribution network 24 includes an input for the clock source 22 situated near the flip-flip D0, and extends therefrom parallel with the flip-flops in the direction of data propagation, and includes a branch to each of the clock inputs of the flip-flops in the chain 20.
In operation, a clock pulse or triggering edge of the clock causes each of the D flip-flops in the chain 20 to propagate data from its D-input to its Q-output. Each consecutive clock pulse or triggering edge causes the data to move further down the chain of flip-flops. In the example shown in FIG. 1, per every clock pulse or triggering edge, data at the D-inputs of flip-flops D0-Dn propagates to the Q-outputs of the flip-flops D0-Dn, respectively. If there are no delays between the Q-outputs and the D-inputs of consecutive flip-flops in the chain 20, then per every clock pulse or triggering edge, data at the D-inputs of the flip-flops propagates to the D-inputs of the next flip-flops in the chain 20. It is desired that the clock pulse or triggering edge of the clock occur at the same time (or are in-phase) at the inputs of all the flip-flops, in order for the data as a whole to propagate together down the chain 20.
The problem with the data propagating chain 20 is that with the clock distribution network 24 shown in FIG. 1, the triggering edge or pulse of the clock does not reach all the clock inputs of the flip-flops at the same time. This results in the data not properly propagating as a whole through the chain; a condition generally termed in the art as "clock skew." To illustrate the problem of clock skew, assume that the time delay for data to propagate from the Q-output of a flip-flop to the D-input of the next flip-flop in the chain is given by .DELTA.T.sub.D. Also, for this example, because the clock distribution network 24 shown in FIG. 1 requires the clock to propagate a longer length to reach the clock inputs of the flip-flops down the chain 20, assume that the time difference of clock at the clock inputs of consecutive flip-flops is given by .DELTA.T.sub.C.
Given the above assumptions for the example, if the time delay .DELTA.T.sub.D for the data to propagate from the Q-output of flip-flop D0 to the D-input of flip-flop D1 is more than the time difference .DELTA.T.sub.C of the clock at the clock inputs of such flip-flops, then triggering edge of the clock it the clock input of flip-flop D1 will clock the current data at its D-input to its Q-output before the next data (the data that propagated through flip-flop D0) propagates to the D-input of flip-flop D1. This is the desired result, that the next data does not propagate to the D-input of the next flip-flop before that flip-flop is clocked for the current data.
However, with the clock distribution network 24 shown in FIG. 1, problems occur for flip-flops down the chain 20. For instance, the time delay of the clock to reach the clock input of flip-flop D2 is given by 2.DELTA.T.sub.D. Assume now that the time delay .DELTA.T.sub.D is smaller than the time delay 2.DELTA.T.sub.C, then the data that propagated through flip-flop D1 will reach the data input of flip-flop D2 before it is clocked. Thus, instead of the current data propagating through flip-flop D2 for that triggering time, the next data propagates through flip-flop D2, thereby, losing the current data for flip-flop D2. This results in the data as a whole improperly propagating down the chain 20. Thus, it is desirable that the clock distribution network be designed so that the flip-flops, or more generally, the clock recipient elements be clocked at substantially the same time to reduce or eliminate the effects of clock skew.
Referring now to FIG. 2, a block diagram of a prior art clock distribution network 30 formed on an ASIC substrate 37 is shown that reduces or eliminates the problem of clock skew. The prior art clock distribution network 30 reduces the clock skew problem by attempting to cause the phase of the clock signal at the clock inputs of all the clock recipient elements in the ASIC to be substantially the same. The clock distribution network 30 is generally referred in the relevant art as a "balanced clock tree," and therefore, will be referred to as such hereinafter.
The balanced clock tree 30 shown in FIG. 2 includes a main buffer 32 for receiving a clock signal from a clock source 34, and used as an initial driving stage for supplying the clock signal to the clock recipients of the ASIC. The output of the main buffer 32 is coupled to an H-shaped conductive tree structure 36 that is used as an initial conduit for the clock to propagate through to reach the clock recipients. The H-shaped conductive tree structure 36 includes an initial entry wide conductive line 38 (or entry conductive line, for short) having a first end coupled to the output of the main buffer 32 and a second opposite end connected to the middle of the mid-section conductive branch 40 of H-tree conductive structure. The ends of the mid-section conductive branch 40 connect to the middle of the outer conductive branches 42 and 44 of the H-tree conductive structure 36. Each of the ends of the outer conductive branches 42 and 44 is coupled to a buffer tree-network 46, which is, in turn, coupled to the clock recipients 48.
The H-tree conductive structure 36 including the entry conductive line 38 are designed so that the phase of the clock signal as it is split by the H-tree structure are substantially the same at the ends of the outer conductive branches 42 and 44, or alternatively, at the points in which the buffer tree-networks 48 connect to the H-tree structure. This is accomplished by forming the H-tree structure 36 on a substrate 37 that has substantially uniform dielectric constant, and by having the same conductive line lengths from the output of the main buffer 32 to the ends of the outer conductive branches 42 and 44, or alternatively, at the points that the buffer tree-networks 48 connect to the H-tree structure.
For instance, in the example balanced clock tree 30 shown in FIG. 2, the clock signal generated at the output of the main buffer 32 initially undergoes a phase shift of .DELTA..phi..sub.1 as it propagates through the entry conductive line 38. When the clock signal encounters the mid-section conductive line 40 of the H-tree structure 36, it splits the clock signal into two clock signals, each propagating towards respective outer conductive lines 42 and 44. The two clock signals each undergo a phase shift .DELTA..phi..sub.2 after propagating through the mid-section conductive line 40 of the H-tree structure 36 since the length of the mid-section line is the same on both sides of the entry conductive line 38.
When the two clock signals reach the outer conductive lines 42 and 44, they both split into four clock signals, each propagating towards respective ends of the outer conductive lines, or alternatively, towards the points in which the buffer tree-networks 46 connect to the H-tree structure 36. When the clock signals reach these points from the middle of respective outer conductive lines 42 and 44, they would have undergone a phase shift .DELTA..phi..sub.3. Thus, the phases of the clock signals at the respective buffer tree-network are substantially the same since they would have all undergone a total phase shift of: EQU .DELTA..phi..sub.total =.DELTA..phi..sub.1 +.DELTA..phi..sub.2 +.DELTA..phi..sub.3
Referring now to FIG. 3, a schematic and block diagram of a prior art buffer tree-network 46 of FIG. 2 is shown. The buffer tree-network 46 provides for further levels of driving stages for driving the clock signal to each of the clock recipients 48 of the ASIC. Like the H-tree conductive structure 36, the buffer tree-network 46 is also designed to distribute the clock signal so that the phases of the clock signal at the clock inputs of all the clock recipients are substantially the same.
In more detail, the buffer tree-network 46 may comprise of one or more levels of parallel buffers. In the example shown in FIG. 3, there are N-levels of parallel buffers. The level 1 buffers 52 are the first level of buffers which initially drives the clock signal that is received at the ends of the outer conductive lines 42 and 44 of the H-tree conductive structure 36. If there are two levels of buffers in the ASIC, then the output of the level 1 buffers are coupled to the inputs of the level 2 buffers 56. If there are more than two levels of buffers in the ASIC, then the outputs of the buffers at one level are coupled to the inputs of the buffers at the consecutive level. In other words, the output of a buffer at one level is coupled to the inputs of several buffers in the next level, and so on to meet the ASIC clock signal load.
The buffer tree-network 46 further includes a clock routing network for each level of parallel buffers. For instance, a level 1 clock routing network 50 is included that routes the clock signal from the ends of the outer conductive lines 42 and 44 of the H-tree conductive structure 36 to the inputs of the level 1 buffers 52. If there are two levels of buffers, then a level 2 clock routing network 54 is included for routing the clock signal from the outputs of the level 1 buffers 52 to the inputs of the level 2 buffers 56. If there are more then two levels of buffers, then there is a clock routing network for each level of buffers for routing the clock signal from the output of buffers at one level to the inputs of buffers at the consecutive level.
Each of the clock routing networks routes the clock signal to the inputs of the next level buffers in a manner that the phases of the clock signals at such inputs are substantially the same. For instance, level 1 clock routing network 50 routes the clock signal from the ends of the outer conductive lines 42 and 44 to the inputs of the level 1 buffers 52 in a manner that the phases of the clock signals at the inputs of the buffers are substantially the same. The level 2 clock routing network likewise routes the clock signals from the outputs of the level 1 buffers 52 to the inputs of the level 2 buffers 56 in a manner that the phases of the clock signals are substantially the same at the inputs of the level 2 buffers; and so on, in the same manner for all other clock routing networks pertaining to the other levels of buffers 3-N.
The level 1 clock routing network 50, which is shown in more detail in FIG. 3 than the other clock routing networks, is used herein as an example of one manner of routing the clock signals so that their phases at the inputs of the buffers at one level are substantially the same. The other levels of clock routing network can be routed in such a similar manner. In one manner, the level 1 clock routing network 50 attempts to equalize the clock signal phase at the buffer input by having the conductive line lengths from the ends of the outer conductive lines 42 and 44 of the H-tree conductive network 36 to the inputs of the level 1 buffers to be substantially the same. For instance, the conductive line length from node 0 (the point in which the level 1 clock routing network 50 connects to the H-tree conductive structure 36) to the inputs of buffers 52a and 52f which are taken off of nodes 3 and 3' are substantially the same, and results in a phase shift of the clock signal given by: EQU .DELTA..phi..sub.52a =.DELTA..phi..sub.52f =.DELTA..phi..sub.4 +.DELTA..phi..sub.5 +.DELTA..phi..sub.6 +.DELTA..phi..sub.9
Similarly, the conductive line lengths from node 0 to the inputs of buffers 52b and 52e are substantially the same, and produces a phase shift of the clock signal given by: EQU .DELTA..phi..sub.52b =.DELTA..phi..sub.52e =.DELTA..phi..sub.4 +.DELTA..phi..sub.5 +.DELTA..phi..sub.8
In order for the phase of the clock signal to be the same at the inputs of buffers 52a, 52b, 52e and 52f, the following relationship holds: EQU .DELTA..phi..sub.8 =.DELTA..phi..sub.6 +.DELTA..phi..sub.9
Thus, if the ASIC layout permits, the clock routing networks can be designed to provide substantially the same conductive line lengths from the point at which the H-tree conductive structure 36 connects to the level 1 clock routing network to each input of the level 1 buffers 52. The same technique can be used for the other clock routing networks, such as level 2 clock routing network 54, so that the conductive line lengths from the output of one of the i'th level buffer to the inputs of the level (i+1) buffers are all substantially the same.
Sometimes, however, it may be difficult because of layout reasons to provide a conductive line of a sufficient length to equalize the clock signal phases at the inputs of the buffers. To illustrate this, assume for example that the phase shift from node 0 to the input of the buffer 52d is of sufficient length to provide the proper phase of the clock signal at the buffer's input. That is, the phase shift of the clock signal from node 0 to the input of buffer 52d is given by: EQU .DELTA..phi..sub.52d =.DELTA..phi..sub.4 +.DELTA..phi..sub.7
In order for the clock signal phase at the input of buffer 52d to be equalized with the clock signal at the inputs of buffers 52a, 52b, 52e and 52f, the following relationship holds: EQU .DELTA..phi..sub.7 =.DELTA..phi..sub.5 +.DELTA..phi..sub.8 =.DELTA..phi..sub.5 +.DELTA..phi..sub.6 +.DELTA..phi..sub.9
Also assume that the conductive line length between node 0 and the input of the buffer 52c is of insufficient length to cause the clock signal to undergo the proper phase shift. Specifically, because of layout reasons, assume that the conductive line between node 1 and the input of the buffer 52c cannot be made any longer in order to produce the desired phase shift of the clock signal. In other words, the following relationship holds: EQU .DELTA..phi..sub.10 &lt;.DELTA..phi..sub.7
To solve this problem of an insufficient conductive line length, a current load 58 is connected to the output of buffer 52c. The effects of the current load 58 on the clock signal causes it to undergo an additional phase shift .DELTA..phi..sub.11 because it effects the buffer driving response. In order to equalize the clock phase at the output of the buffer 52c with that of the outputs of the other buffers of the same level, the current load is made of sufficient size that the additional phase shift .DELTA..phi..sub.11 results in the clock phase at the output of buffer 52c to be equalized with the others. In other words, the following relationship holds: EQU .DELTA..phi..sub.10 +.DELTA..phi..sub.11 =.DELTA..phi..sub.7 =.DELTA..phi..sub.5 +.DELTA..phi..sub.8 =.DELTA..phi..sub.5 +.DELTA..OMEGA..sub.6 +.DELTA..phi..sub.9
Therefore, by varying the length of the conductive lines to each of the input of the buffers and also employing current loads when the conductive lines cannot be made any longer because, for example, of layout reasons, the clock routing networks can be designed to provide substantially the same phase of the clock signal at the output of the buffers of the same level. This can be done for all levels of buffers so that the phases of the clock signal at the output of the N'th level buffers are substantially the same. This is assuming, of course, that each of the buffer in the buffer tree-network 46 causes the clock signal to undergo substantially the same phase shift.
Referring now to FIG. 4, a schematic diagram of a group of clock recipients 48, or more specifically, a chain of D flip-flops (D1-D6) is shown, wherein the flip-flops are driven by one of the N'th level buffers of the buffer tree-network 46 by way of a recipient clock routing network 60. The clock routing network 60 is similar to the clock routing networks of the buffer tree-network 46 in that it routes the clock signal from the output of one of the N'th buffers to the clock inputs of the clock recipients (D flip-flops D1-D6) so that the phases of the clock signal at each of the clock inputs of the clock recipients are substantially the same. In the example shown in FIG. 4, this can be done by designing the clock routing network so that the following phase-shift relationship holds: EQU .DELTA..phi..sub.7 =.DELTA..phi..sub.5 +.DELTA..phi..sub.8 =.DELTA..phi..sub.5 +.DELTA..phi..sub.6 +.DELTA..phi..sub.9
In summary, the balanced clock tree 30 routes the clock signal in an ASIC in a manner that the phases of the clock signals at the clock inputs of the clock recipients are substantially the same. This is done by providing an H-tree conductive structure 36 for providing the initial distribution of the clock signal from the ASIC clock input to the inputs of the buffer tree-networks 46 in a manner that the phases of the clock signal thereat are substantially the same. The buffer tree-network 46 includes clock routing networks (e.g., 50 and 54) for providing the clock signal to the buffers in a manner that the phases of the clock signal at the output of each of the buffer within one level are substantially the same. Another similar clock routing network is provided at the output of the last level of buffers so that the clock signal is provided to the clock input of the clock recipients in a manner that the phases of the clock signal thereat are substantially the same. This type of clock distribution reduces the adverse effects of clock skew.
Although the balanced clock tree 30 is useful for reducing clock skew problems, there may be disadvantage to it when the number of clock recipients in an ASIC becomes large. One disadvantage of the balanced clock tree is that when the number of clock recipient becomes large, problems such as ground bounce and V.sub.DD noise results. This is because a large number of clock recipients are being clocked at substantially the same time. The result is that it takes a relatively large current to clock all those clock recipients at substantially the same time. This large current causes the ASIC ground to jump in voltage during clocking; a condition known as ground bounce. The same large current during clocking also results in the V.sub.DD voltage supply to drop during clocking of the clock recipients, which causes noise on the V.sub.DD line.
Another disadvantage is that it becomes increasingly difficult to layout or balance the clock distribution network when the number of clock recipient becomes large. Many ASIC design ECAD systems perform the layout and the balancing of the clock distribution network during the physical design stage of design, which may prove to be too late for effective balancing of the clock distribution network when the number of clock recipient becomes very large.
Furthermore, another disadvantage of the prior art clock distribution network and the design method is the possibility of unused cycle time. For example, FIG. 5 shows registers 71, 73 and 75 with logic blocks 72 and 74 which form the data propagation paths between the registers. Logic block 72 is placed in between registers 71 and 73, and the logic block 74 is placed in between registers 73 and 75. For illustration purposes, the following facts are assumed. The data propagation path between the registers 71 and 73 requires 5 ns of clock cycle time to transmit the data from register 71 to register 73, and the data propagation path between registers 73 and 75, which is the longest data path in the ASIC, requires 7 ns of clock cycle time to transmit the data from register 73 to register 75. Since the data propagation path between registers 73 and 75 requires 7 ns of clock cycle time, the chip level clock cycle time has to be at least 7 ns in order to match the delay in the data propagation path between registers 73 and 75, which is the longest data propagation path in the ASIC. If the chip level clock cycle time is shorter than 7 ns, then the data from register 73 would not have enough time to reach register 75 for the proper propagation of the data because the data propagation path between registers 73 and 75 takes at least 7 ns to travel. Because the prior art clock distribution network 30 reduces the clock skew problem by attempting to clock the clock recipients at the same time by causing the phase of the clock signal at the clock inputs of all the clock recipient elements in the ASIC to be substantially the same, the chip level clock cycle time has to be at least as long as the delay in the longest data propagation path in the ASIC for proper data propagation. In the example shown above, 2 ns of clock cycle time is wasted for registers 71 and 73 because the data propagation path between registers 73 and 75 takes 7 ns while the data propagation path between registers 71 and 73 takes only 5 ns of clock cycle time. Registers 71 and 73 have to wait idly for additional 2 ns until register 75 receives its data from register 73.
Therefore, there is a need for methods and computer-aided design tools for planning the clock distribution network in the conceptual design phase of the ASIC devices to reduce clock skew, ground bounce, VDD noise and idle clock cycle time, and avoid the difficulty of balancing the clock distribution network in the latter stages of the ASIC design.