This invention relates to a clock synchronization method for synchronizing the phases of a reference clock and a frequency-divided clock which are supplied from a clock generation circuit provided in a top hierarchical block to terminal flip-flops (FFs) provided in each of a plurality of lower hierarchical blocks.
In a layout design of a large-scale semiconductor integrated circuit, a hierarchical layout design technique is used. In the hierarchical layout design technique, a large part of a circuit in a semiconductor integrated circuit is divided into a plurality of lower hierarchical blocks, and the layout design of each of the lower hierarchical blocks is carried out. Subsequently or concurrently, a layout design of a top hierarchical block excluding the lower hierarchical blocks is carried out, and a clock signal is connected to each of the lower hierarchical blocks while wirings interconnecting between the lower hierarchical blocks are connected.
In the layout design of the lower hierarchical blocks and the top hierarchical block, in order to supply a clock signal to a group of a very large number of FFs in each of the lower hierarchical blocks, a technique called clock tree synthesis is used. The clock tree synthesis is a function for making uniform the delay times (synchronizing the phases) of the clock signal which reaches, through associated paths, all of the FFs in each of the lower hierarchical blocks from a starting point in the top hierarchical block.
In the hierarchical layout design technique, since clock wiring extends across the top hierarchical block and each of the lower hierarchical blocks, a clock tree of a downstream portion is generated in each of the lower hierarchical blocks. Then, on the basis of the result, a clock tree of an upstream portion is generated in the top hierarchical block, and the delay time of a clock signal in the top hierarchical block is adjusted such that the delay times of the clock signal from the clock signal starting point to all of the FFs in each of the lower hierarchical blocks are made uniform.
In a semiconductor integrated circuit, fluctuation in the delay times of signals occurs due to process variation in manufacturing, voltage/temperature variation during operation, or the like. When the layout design is made, timing closure is carried out such that the circuit operates normally even if delay time variation among signals occurs. Specifically, the variation in delay time of each signal is called “On Chip Variation (OCV),” and the timing closure is carried out such that the semiconductor integrated circuit operates normally even if a margin for OCV is taken into account.
Further, in a large-scale integrated circuit, in many cases, a frequency-divided clock group is generated from a reference clock by a clock generation circuit in a top hierarchical block, and is distributed to each of lower hierarchical blocks. Thus, the reference clock and the frequency-divided clock are branched off on an upstream side of a clock tree, and the paths of the reference clock and the frequency-divided clock after the branch point tend to be longer. Since the long clock paths for each of the lower hierarchical blocks after the branch point vary individually, a timing margin to be considered is to be a large value, which leads to difficulty in timing closure.
FIG. 6 is a conceptual diagram of an example showing the configuration of a conventional semiconductor integrated circuit. A semiconductor integrated circuit 70 shown in the drawing is designed by a hierarchical layout design technique, and is provided with three lower hierarchical blocks A, B, and C, and a top hierarchical block excluding the lower hierarchical blocks A, B, and C. The top hierarchical block includes a clock generation circuit 12, and the clock generation circuit 12 includes a PLL circuit 14 and a frequency divider circuit 16.
In the case of the semiconductor integrated circuit 70, in the clock generation circuit 12, a reference clock 15 is generated by the PLL circuit 14, and the reference clock 15 output from the PLL circuit 14 is frequency-divided by the frequency divider circuit 16 to generate a frequency-divided clock 17. Each of the lower hierarchical blocks A, B, and C is provided with first terminal FFs 18 operating in synchronization with the reference clock 15 and second terminal FFs 20 operating in synchronization with the frequency-divided clock 17. The reference clock 15 and the frequency-divided clock 17 are supplied from the clock generation circuit 12 to all of the first terminal FFs 18 and all of the second terminal FFs 20 in each of the lower hierarchical blocks A, B, and C, respectively.
When layout design of the semiconductor integrated circuit 70 is carried out, in the hierarchical layout design technique, first, clock trees (in FIG. 6, indicated by triangular frames) 19 and 21 of the reference clock and the frequency-divided clock are generated for each of the lower hierarchical blocks A, B, and C. Since the lower hierarchical blocks A, B, and C are different in size or the number of the first terminal FFs 18 and the number of the second terminal FFs 20, the delay times of the clock trees are different. In the example shown in the drawing, the delay times of the clock trees 19 and 21 of the reference clock and the frequency-divided clock in the lower hierarchical blocks A, B, and C are 2 ns, 7 ns, and 5 ns, respectively.
Subsequently, clock trees of the reference clock 15 and the frequency-divided clock 17 are generated in the top hierarchical block. In this case, the delay times of the clock trees of the reference clock 15 and the frequency-divided clock 17 in the top hierarchical block are adjusted so as to eliminate skews, that is, the delay times of the clock trees 19 and 21 of the reference clock and the frequency-divided clock in the lower hierarchical blocks A, B, and C. In the example shown in the drawing, the delay times of the clock trees of the reference clock 15 and the frequency-divided clock 17 from the clock generation circuit 12 to the lower hierarchical blocks A, B, and C are 10 ns, 5 ns, and 7 ns, respectively.
From this, the delay times from a branch point 13 of the reference clock 15 and the frequency-divided clock 17 to the first terminal FFs 18 and the second terminal FFs 20 of each of the lower hierarchical blocks A, B, and C are adjusted to 12 ns, and consequently, all of the first terminal FFs 18 and second terminal FFs 20 of the lower hierarchical blocks A, B, and C can be synchronously operated.
However, in this case, there is a problem in that the timing margins of the reference clock 15 and the frequency-divided clock 17 are excessively large, and therefore, the design is difficult. Since the frequency-divided clock 17 branched off from the reference clock 15 in the clock generation circuit 12 is supplied to each of the lower hierarchical blocks A, B, and C, the clock path corresponding to 12 ns downstream of the branch point 13 of the reference clock 15 and the frequency-divided clock 17 is a portion to have OCV variation.
As shown in FIG. 7, when a period of the reference clock 15 is 5 ns, that is, when a period of the frequency-divided clock 17 obtained by dividing the frequency of the reference clock 15 by two is 10 ns, for example, there is needed a large timing margin exceeding one period of the reference clock 15, that is, in this example, the timing margin of 12 ns in total including 6 ns before the rising timing of the reference clock 15 and the frequency-divided clock 17 and 6 ns after the same, and thus, the layout design becomes extremely difficult.
In order to solve this problem, as shown in FIG. 8, a synchronous FF 41 which receives the frequency-divided clock 17 as a data signal in synchronization with the reference clock 15 and newly outputs the frequency-divided clock 17 in synchronization with the reference clock 15 is disposed on the path of the frequency-divided clock 17 at a position near each of the lower hierarchical blocks A, B, and C.
With this, the branch point is moved from the branch point 13 of the reference clock 15 and the frequency-divided clock 17 to a branch point 37 near each of the lower hierarchical blocks A, B, and C, and as shown in FIG. 9, the timing margin can be made small to a certain degree. In this example, the timing margin can be reduced to 8 ns in total.
Further, as shown in FIG. 10, the synchronous FF 41 is moved to a position near the second terminal FFs 20 inside each of the lower hierarchical blocks A, B, and C, whereby the branch point can be moved from the branch point 13 of the reference clock 15 and the frequency-divided clock 17 to a branch point 39 near the second terminal FFs 20 inside each of the lower hierarchical blocks A, B, and C, and as shown in FIG. 11, the timing margin can be made still smaller. In this example, the timing margin can be reduced to 4 ns in total.
However, in this case, there is a problem in that the timing closure of the synchronous FF 41 disposed on the clock path is difficult.
As shown in FIG. 12, in many cases, the delay time of the reference clock 15 supplied to the synchronous FF 41 is greater than one period of the reference clock 15. In this example, the delay times of the reference clock 15 from the branch point 13 of the reference clock 15 and the frequency-divided clock 17 inside the clock generation circuit 12 to the synchronous FFs 41 inside the lower hierarchical blocks A, B, and C are 11 ns, 7 ns, and 9 ns, respectively.
When the delay exceeding a period of the reference clock 15 occurs in the reference clock 15 flowing through the clock path of the synchronous FF 41, in many cases, the same degree of delay also occurs in the frequency-divided clock 17 flowing through the data path running parallel to the clock path of the synchronous FF 41. If the delay time of the data path exceeds the period, a set-up restriction of synchronous design may not be satisfied. For this reason, as shown in FIG. 13, it is necessary to add plural stages of synchronous FFs 41 to the data path such that the delay time of the data path falls within one period of the reference clock 15.
However, even if the plurality of synchronous FFs 41 are added such that the delay time of the data path falls within one period, it is difficult to satisfy a hold restriction.
Hereinafter, difficulty in satisfying the hold restriction will be described.
First, as shown in FIG. 14, timing closure of a group of synchronous FFs 41 with respect to one lower hierarchical block is considered.
If only the group of synchronous FFs 41 is in question, as shown in FIG. 15, buffers 45 are inserted in the paths of the reference clock from the branch point of the reference clock and the frequency-divided clock to all of the synchronous FFs 41 to make the delay times uniform (to eliminate a clock skew), whereby the set-up/hold restriction can be satisfied.
Meanwhile, as in the above-described example, when there are three lower hierarchical blocks A, B, and C, and the delay times associated with the lower hierarchical blocks A, B, and C are 11 ns, 7 ns, and 9 ns, respectively, and thus different, in general, the clock delays are adjusted to 11 ns which is the maximum value among the three values so as to prevent punch-through of data.
However, in this case, there is a problem in that by adjusting the delay times of the FFs constituting the frequency divider circuit and all groups of synchronous FFs 41 to the maximum delay time, the number of buffers 45 added on the path of the reference clock becomes very large, and accordingly, the layout area and power consumption increase. Furthermore, if the difference between the maximum value and the minimum value of the delay time of the reference clock exceeds one period of the reference clock, this exceeds a time width which can be dealt with by synchronous design, and thus, timing closure is not possible.
In this case, the above-described method of adjusting the delay times of the reference clock to the maximum delay time does not work well. Therefore, as shown in FIG. 16, the set-up/hold restriction needs to be satisfied while partially or wholly shifting the delay times of the reference clock.
Next, difficulty in satisfying the set-up/hold restriction while shifting the delay times of the reference clock will be described.
Since a semiconductor integrated circuit needs to be operated under various temperature, voltage, and process conditions within an operation guaranteed range, it is necessary to consider delay fluctuation caused by various operation environments. In order to prevent punch-through of data, as shown in FIGS. 17A and 17B, data Q output from a preceding-stage FFA needs to be delayed by buffers 47 so that data D of a subsequent-stage FFB changes after the rise of a clock CLK of the subsequent-stage FFB. For this reason, in this configuration, the delay time of the buffers 47 in the data path should be larger than the delay time of buffers 49 in the clock path.
Although the delay time of each of the cells fluctuates due to the operation environment, since the delay times of the buffers 47 and 49 are large, the magnitude of delay fluctuation of the data path is large. Since the clock path also has delay variation, it is difficult to carry out timing closure to satisfy the set-up/hold restriction while considering the delay variations of both paths. In actual layout design of a semiconductor integrated circuit, usually, it is necessary to readjust timing closure many times.
Here, prior art literatures relevant to the invention include JP 2002-76127 A relating to a distribution system of a clock signal for synchronization on a semiconductor integrated circuit and a semiconductor chip, JP 2002-158286 A relating to a semiconductor integrated circuit or the like which controls clock distribution, JP 2001-320022 A relating to a clock distribution system in an integrated circuit in which a multi-phase clock is distributed to internal circuits, and JP 9-51255 A relating to a clock generation circuit which generates a plurality of delayed clocks by dividing the frequency of a reference clock to delay the reference clock.