The present invention relates to a technology utilized when designing a circuit which includes a signal supplying source (e.g., clock signal supplying source) within the circuit and a plurality of elements (e.g., flip-flop cells) supplied with a signal from the signal supplying source.
In more particularly, the present invention relates to a method of optimizing signal lines within the circuit, an optimizing apparatus and a recording medium having stored therein an optimizing program which are utilized when automatically synthesizing signal lines (net) connecting the signal supplying source to each of the elements in which a buffering scheme of a clock system net (signal lines) on an integrated circuit such as a VLSI is improved so that a delay or a skew of a signal from the signal supplying source to each of the elements is optimized.
Further, the present invention relates to a method of designing a circuit and a recording medium having stored therein a program for designing suitable for use in an implementation design of clock paths of an integrated circuit such as an LSI.
In general, an integrated circuit such as a VLSI has placed a number of driven elements (e.g., flip-flop cells (sometimes referred to as FF)) which are to be driven synchronously in response to a clock signal. The driven elements are supplied with a clock signal through a clock signal line from a clock supplying source (source).
In this case, as shown in FIG. 47, if several ten thousands of FF 502 are connected to a single clock supplying source 501, it will be impossible for all of the elements to be directly driven by the clock supplying source due to a fan-out restriction. In this case, the term fan-out restriction means a condition which shall be satisfied when a driving element (clock supplying source 501) drives driven elements (FF502) connected to the output side of the driving element. In more concretely, if the driving element does not satisfies the following equation in its driving capability, the elements under deriving cannot be driven.
[driving capability of the driving element] greater than [wiring capacity from the driving element to driven elements on the next stage]+[load of the driven elements on the next stage]
As described above, if several ten thousands of FFs 502 are arranged to driven directly by a single clock supplying source 1, the sum of the wiring capacity and loads of the FFs 502 will become extremely large. For this reason, even if a clock signal has a rising edge which rises as shown in FIG. 48A upon generating from the clock signal supplying source, the clock signal will come to have a rising edge dull as shown in FIG. 48B when the clock signal has traveled the distance from the clock supplying source 501 to each of the FFs 502, with the result that the clock signal becomes incapable of driving each FF 502.
For this reason, in general, an arrangement of buffering has been made as shown in FIG. 49. That is, clock signal lines extending between the clock supplying source 501 to each of the FFs 502 are wired in a tree-like manner (clock tree is synthesized), and a plurality of stages (in the case of FIG. 49, a couple of stages are shown) of buffer elements (sometimes referred to as buffer cell) are inserted and placed in the clock signal lines. If such a buffering is effected, the fan-out restriction will be satisfied in each of the plurality of stages of the buffer elements 503 between the clock supplying source 501 to the plurality of FFs 502.
Further, in order to satisfy the fan-out restriction, taken is a measure known as a buffer-sizing in which each cell provided in a clock system is adjusted or changed in its driving capability. A cell as a target of the buffer-sizing is, in addition to a logic gate or a selector, a buffer element which is provided on a clock signal line as a buffering element as described above. The selector is utilized for selecting a clock signal when a plurality of clock signals are supplied in the circuit under designing.
According to a conventional method of buffering or buffer-sizing, when clock system nets (clock signal lines or clock tree) are synthesized upon designing a circuit, an initial logic is once reduced into a placement based on a netlist obtained by a logic design, and thereafter a portion included in the placement in which the fan-out restriction is not satisfied is subjected to a process of buffering or buffer-sizing so that the portion satisfies the fan-out restriction. If a portion of the placement reduced from the initial logic satisfies the fan-out restriction, the portion is not subjected to the process of buffering or buffer-sizing.
The above-described design scheme is disclosed in, for example, a reference entitled xe2x80x9cA Methodology and Algorithms for Post-Placement Delay Optimizationxe2x80x9d at page 327 to 332 of Proceeding of the 31st ACM/IEEE Design Automation Conference written by Lalgudi N. Kannan, Peter R. Suaris, and Hong-Gee Fang.
Recently, an integrated circuit such as a VLSI comes to have a great number of circuit components and hence it is fabricated in a complicated fashion owing to the progress of fabrication technology. With this tendency of technology, a number of elements supplied with a clock signal on the integrated circuit tends to increase, leading to difficulty in designing a clock logic of the integrated circuit.
Further, since a gate is subjected to a microfabrication technology, the integrated circuit device is further requested to process a signal at a higher speed. Thus, it is desired to establish a technology which makes it possible to synthesize a clock tree with a skew reduced. That is, it is desired for the clock signal to reach all driven elements (such as FFs) from a clock supplying source substantially at the same time (ideally exactly at the same time). A scattering of time (of delay) it takes for the clock signal to be transmitted from the clock supplying source to each of the driven elements is known as a clock skew. As described above, as a clock frequency is increased in accordance with the request for the high speed processing, the clock skew is requested to be substantially completely eliminated.
However, as described above, according to the conventional way of buffering, the buffering is carried out only to satisfy the fan-out restriction. Therefore, to reduce the skew by arranging the clock tree in a balanced manner is not taken into account. That is, although the net is subjected to the buffering in such a manner that the driven elements (such as FF elements) are divided into groups so that each of the groups can be driven by a single buffer element, the conventional way of group dividing is effected only to satisfy the fan-out restriction. Therefore, the number of driven elements or the size of load capacity in each group is not taken into consideration. If scattering in the number of elements or load capacity for every group is large, then scattering of wire length from the final stage buffer element to the driven element in each group also becomes large, with the result that it becomes very difficult to adjust the clock skew. Accordingly, in order to satisfy the request of high speed processing, or reduction of the clock skew, which becomes more demanding in recent years, the clock distributing system for distributing the clock to each group shall be arranged in a balanced fashion so as to suppress the scattering in number of elements or the load capacity of every group.
Further, according to the conventional way of buffering, the buffering is not effected in connection with layout information (physical information). Therefore, a layout can result in that a single buffer element is obliged to drive a plurality of flip-flop elements which are remote from the buffer element in terms of physical distance. As a result, there is a fear that a skew adjustment becomes difficult upon arranging layout or that the clock signal line becomes long, which facts lead deterioration in circuit quality.
The present invention is made in view of the above aspect. Thus, a first object of the present invention is to provide a method of optimizing signal lines within a circuit, an optimizing apparatus and a recording medium having an optimizing, program stored therein in which delay of signal from a signal supplying source to each of elements is optimized to positively reduce a skew, and it becomes possible to realize a circuit designing which can cope with a request of high speed processing.
Incidentally, as described above, the integrated circuit such as an LSI includes a number of flip-flop cells and buffer cells for driving these flip-flop cells. In general, a single buffer cell is forded to drive a plurality of flip-flop cells through a clock signal line. When a layout of circuit elements and wiring of clock signal lines are determined in an LSI, it is requested to shorten the clock wiring length for reducing a signal propagation delay time and relieving the signal lines from crowded wiring. Further, circuit elements which are designed to be supplied with a clock signal are requested to undergo a suitable circuit element placement so as to suppress a time difference (known as clock skew).
Now a conventional LSI layout method (circuit designing method) will be described in detail with reference to FIGS. 50 to 65.
FIG. 50 is a flowchart useful for describing the conventional LSI layout method (circuit designing method). As shown in FIG. 50, the LSI layout method consists of a logic design step S27, a clock layout step S28 and ordinary net wiring step S29 for wiring signal lines except for the clock signal lines.
The logic design step S27 is composed of step S271 of synthesizing logic and step S272 of creating a clock tree.
In step S271 of synthesizing logic, a hardware description language (HDL) described in a register transfer level (RTL) of elements constituting the LSI is formed into a logic which can be implemented into a layout. Thus, a first netlist (1) formed of modules is created. In clock tree creating step S272, a clock tree before having a buffer cell introduced into the LSI is created based on the first netlist (1).
FIG. 51 is a diagram showing one example of clock tree before having a buffer cell introduced into the integrated circuit. In FIG. 51, reference numeral 281 depicts a module, 282 to 284 small modules, 285 a clock source as a clock supplying source, and 286 to 288 flip-flop cells. Since the circuit of the stage has no buffer cell introduced thereinto, the clock source 285 directly drives the flip-flop cells 286 to 288.
Thereafter, the first netlist (1) is subjected to a buffering processing in which buffer cells are introduced into the clock tree to create a second netlist (2).
FIG. 52 is a diagram showing a clock tree within the module of the LSI having a buffer cell introduced thereinto. As shown in FIG. 52, buffer cells 291 to 296 are additionally introduced into the clock tree of FIG. 51. A number of buffer cells in each module is determined depending on the number of driven flip-flop cells. As shown in FIG. 52, since the number of flip-flop cells 286 and 287 in the small modules 282 and 283 are small, each group of flip-flop cells of the small modules is allocated with a single buffer cell 292 and 293 and driven by the same. However, since the number of flip-flop cells 288 in the small module 284 is larger than the number of flip-flop cells 288 in the small modules 282, and 283, the small module 284 is arranged as a tree structure so that the flip-flop cells in the small module 284 are driven by three buffer cells 294, 295 and 296.
Referring back to FIG. 50, the contents of the second netlist (2) is transferred to clock layout step S28.
Clock layout step S28 is composed of steps S281 to S287.
In step S281, a floorplan is acquired. The floorplan is layout information obtained from a designer of the LSI.
In step S282, the buffer cells and the flip-flop cells are subjected to an initial placement as shown in FIG. 53 based on the floorplan so that the clock wiring length becomes short as far as possible. FIG. 53 is a diagram showing a result of the initial placement of the buffer cells and the flip-flop cells in the LSI module. In FIG. 53, lines are drawn between a buffer cell 301 and each of six flip-flop cells 310 to 315 so as to indicate that the six flip-flop cells are driven by the buffer cell.
At step S283, clock buffers are placed in a bottom-up fashion. A placement in a bottom-up fashion means that a flip-flop cell farthest from the clock source is given placement with priority. According to the scheme, buffer cells on the clock bus are placed in a bottom-up fashion based on flip-flop cell placement information so that the distance from the buffer cell 301 to each of the flip-flop cells 310 to 315 becomes equal as far as possible. As a bottom-up placement scheme, there can be a scheme in which a buffer cell is placed at the center of the placement area where flip-flop cells driven by the buffer cell are located. Alternatively, there can be a scheme in which a buffer cell is placed at the center of gravity of flip-flop cells which are driven by the buffer cell.
FIG. 54 is a diagram showing an example of placement within an LSI module in which the buffer cell 301 is placed at the center of the placement area where flip-flop cells 310 to 315 driven by the buffer cell 301 are located.
At step S284, in order to determine the restriction in wiring length between the buffer cell to each of the flip-flop cells, a placement restriction area is created for the flip-flop cells. The placement restriction area is a lozenge area having the buffer cell driving the flip-flop cells centered. FIG. 55 is a diagram showing an example in which a placement restriction area 322 having a buffer cell 321 centered. In the next step, a placement processing is retried under consideration of the placement restriction area 322. In this case, the placement of the buffer cell 321 deriving from the bottom-up placement at step S283 is fixed, the flip-flop cells is subjected to the placement restriction based on the placement restriction area created at step S284, and cells other than the clock buffer are again subjected to placement. FIGS. 56 and 57 are diagrams showing a result of retried cell placement. As shown in FIGS. 56 and 57, the buffer cell 321 is located at the center of the placement restriction area 322, and the flip-flop cells 331 to 336 are located within the placement restriction area 322. Rectangular shapes in FIG. 57 indicate flip-flop cells.
At step S286, a special arrangement of wiring is effected on a clock tree final stage net. In this case, three types of processing shown in FIGS. 58 to 60 are carried out. That is, FIG. 58 shows a setting of cluster bar information 351 to 357. FIG. 59 shows a wiring of branch lines each connecting a flip-flop cell and the cluster bar. FIG. 60 shows a route wiring connecting the driving buffer cell 321 to the bluster bars.
Thereafter, a wiring of special arrangement of the net connecting buffers to one another is effected. In this case, as shown in FIG. 61, signal lines are radiated in arbitrary directions from a buffer cell 381 to four buffer cells 282 to 385 which are driven by the buffer cell 381. If the above-described steps S282 to S287 are carried out, then the placement and the wiring of the clock tree will be accomplished so that the signal propagation delay time between a clock source to a flip-flop and the skew value are suppressed to a limit value. Thereafter, a wiring processing for an ordinary net is effected at step S29.
The above-described LSI layout method is a scheme mainly utilized when designing a gate array or an embedded array an ASIC (Application Specific Integrated Circuit). In this scheme, when a process goes to a step after a logic design stage, a clock layout and an LSI implementation design are effected without changing the netlist. According to the above-described conventional method, clock signal lines are subjected to a layout process in such a manner that the clock tree logic is not changed and the clock signal is transmitted while satisfying a condition of a signal propagation delay time and a skew value determined by a user.
However, according to the above-described conventional method, when the processing stays in step S271 in which a logic synthesis (logic design) is effected and in step S272 in which the clock tree is created, the logic design is carried out without taking the placement position of flip-flop cells and clock signal line wiring paths into consideration. For this reason, if the clock layout is effected on the logic information so as to satisfy the clock restricting condition, then wiring becomes crowded and clock wiring and signal wiring after the clock wiring will encounter difficulties which prevent the clock wiring and the signal wiring from being carried out.
The difficulties lying in the process of the clock wiring and the signal wiring will hereinafter be described in detail.
When the clock tree is created at step S272, floorplan and placement of flip-flops are not taken into consideration. Therefore, if the placement step is completed up to the stage of placing buffer cells, as shown in FIG. 62, floorplan blocks 392 and 395 on a semiconductor chip 391 can be placed so as to be remote from each other in spite of the fact that both of the floorplan blocks 392 and 395 are driven by a buffer cell 396. Similarly, floorplan blocks 393 and 394 on the semiconductor chip 391 can be placed so as to be remote from each other in spite of the fact that both of the floorplan blocks 393 and 394 are driven by a buffer cell 397. In this case, signal lines 398 to 401 of a network (hereinafter referred to as net) for driving the floorplan blocks remote from each other become long, with the result that a propagation delay time of the clock signal becomes long and wiring is crowded (first problem).
At step 5285 of FIG. 50 in which placement of flip-flops is retried under consideration of placement restriction area, as shown in FIG. 63, if placement restriction areas 430 and 431 are overlaid on one another and the flip-flop cells are subjected to retry of placement, then number of flip-flop cells can be concentrated in the common area of the placement restriction areas 430 and 431 as shown in FIG. 64. Such a concentration of the flip-flop placement will cause crowded wiring (second problem). In FIGS. 63 and 64, reference numerals 410 and 420 depict buffer cells, 411 to 416 flip-flop cells driven by the buffer cell 410, 421 to 426 flip-flop cells driven by the buffer cell 420, and 430 and 431 placement restriction areas on the buffer cells 410 and 420, respectively. FIG. 64 is a diagram showing an example in which the flip-flop cells 411 to 416 are moved into the placement restriction area 430 while the flip-flop cells 421 to 426 are moved into the placement restriction area 431.
Further, if the flip-flop cells are moved from initial placement positions to the placement restriction area, then a signal line of the net which is connected to any component other than a flip-flop cell supplied with clock signal may become long, with the result that wiring except for the clock signal also become crowded (third problem). FIG. 64 contains examples of signal lines 441 to 445 which become excessively long.
Furthermore, as shown in FIG. 61, if a plurality of buffer cells are driven by a single buffer cell, wiring between the buffer cells can contain a crowded portion of wiring in signal lines between buffer cells (fourth problem). FIG. 61 shows an arrangement of a buffer cell 381 and a plurality of buffer cells 382 to 385 driven by the buffer cell 381, wherein wiring between the buffer cell 381 and each of the buffer cells 382 and 383 is crowded.
As a problem other than the problem of crowded wiring, there can be a problem of a skew value of a signal propagation delay time (fifth problem). That is, at step S285 of FIG. 50, the placement processing is retried while the placement restriction of flip-flop cells is effected, and flip-flop cells are collected, whereby the skew value is suppressed. However, a flip-flop cell, for example a flip-flop cell 371 shown in FIG. 60, can be located at a place remote from a cluster of the cell. In this case, a signal line extending to the flip-flop cell becomes long, and hence the propagation delay time of a signal reaching the cell is increased, with the result that the skew becomes large.
Further, at step S272 of FIG. 50 in which the clock tree is created, buffer cells are placed so as to satisfy the fan-out restriction. However, if a placement is effected based on the above scheme, as shown in FIG. 52, there can be formed portions of net having buffer stages from the clock source 285 to a flip-flop cell respectively different from each other, such as the module 282 and the module 284 or the module 283 and the module 284. In this case, a clock path including two stages of buffer and a clock path including three stages of buffer will have signal propagation delay times different from each other, with the result that the sew becomes large (sixth problem).
Furthermore, according to the conventional method shown in FIG. 65, at step S282 of FIG. 50 in which the initial placement is determined, flip-flop cells 451 to 454 are placed while clock paths 457 and 458 extending from clock buffer cells 455 and 456 taken into account. Therefore, wiring paths of data path 359 connecting the flip-flop cells to one another intersects one another, resulting in complicated wiring. In addition, a data path 459, and clock paths 457 and 458 become long, which causes a timing error (seventh problem).
The above-introduced first problem derives from a fact that the layout information is not taken into account when the clock tree is created.
The above-introduced second and third problems derive from facts that the placement restriction areas can be overlaid on one another, and driven cells are forcibly moved into the overlapped portion of the placement restriction areas when the placement is retried.
The above-introduced fourth problem derives from a fact that a wiring between a buffer cell for driving another cell and buffer cells which are driven by that buffer cell is arranged merely as a radiated fashion.
The above-introduced fifth problem derives from a fact that the conventional method employs a scheme that flip-flop cells are merely subjected to a placement retry within the placement restricted area.
The above-introduced sixth problem derives from a fact that if the clock tree includes modules having different buffer stages, then a clock skew causing in the path becomes large.
The above-introduced seventh problem derives from a fact that flip-flop cells are placed upon initial placement with a clock path not disregarded.
Accordingly, a second object of the present invention is to propose a method of designing a circuit and a recording medium having stored therein a program for designing a circuit in which the circuit can be designed such that a clock signal line and signal line other than the clock signal line are short, a wiring of the circuit can be prevented from being crowded, a signal propagation delay time is short, and the scattering of the signal propagation delay time can be properly suppressed.
According to the present invention, in order to achieve the above object, there is proposed a method of optimizing signal lines within a circuit utilized when designing a circuit including a signal supplying source and a plurality of elements which are supplied with a signal from the signal supplying source through signal lines. The method is utilized for optimizing signal distribution through the signal lines connecting the signal supplying source each of the plurality of elements. The method includes a step of determining whether or not the signal supplying source satisfies a fan-out restriction if the signal supplying source supplies a signal to all of the driven elements which are directly connected to the signal supplying source, a step of dividing the plurality of elements into a plural number of groups so that the fan-out restriction is satisfied in each of the groups and that each of the groups has the same or substantially the same load capacity, when it is determined in the determining step that the signal supplying source does not satisfy the fan-out restriction; and a step of inserting into each of the groups divided at the dividing step, a buffer element having a size which makes the groups of elements satisfy the fan-out restriction, wherein the buffer element inserted at the buffer inserting step is regarded as a driven element, and then it is again determined in the determining step whether or not the signal supplying source satisfies the fan-out restriction if the signal supplying source supplies a signal to all of the driven elements which are directly connected to the signal supplying source, and the dividing step and the buffer inserting step are repeatedly carried out until the determining step delivers a determination that the signal supplying source satisfies the fan-out restriction.
If the determining step delivers a determination that the signal supplying source satisfy the fan-out restriction, then an optimizing step may be executed in such a manner that a delay taken place between the signal supplying source and each of the plurality of elements is analyzed, and the signal distribution through the signal lines is optimized by inserting or removing one or more buffer elements, or by changing the buffer size, or by changing the group to which the driven element belongs.
If the determining step delivers a determination that the signal supplying source does not satisfy the fan-out restriction, then an evaluation value calculating step may be carried out in such a manner that an evaluation value is calculated for each of the all pairs of the driven elements on the basis of a circuit performance enhancing factor, and the driven elements are divided into groups in the dividing step so that, of pairs satisfying the fan-out restriction, a pair having the largest evaluation value calculated at the evaluation value calculating step is brought into the same group with priority.
In this case, the above evaluation value calculating step may include the following items No. 1 to No. 6 for calculating the evaluation value.
1. A circuit between two driven elements of each pair is analyzed and a hold error evaluation value is determined as the evaluation value for each pair so that the hold error evaluation value becomes larger with increase of the probability at which the hold error is taken place in each pair.
2. A circuit between two driven element of each pair is analyzed and a setup error evaluation value is determined as the evaluation value for each pair so that the setup error evaluation value becomes larger with increase of the probability at which the setup error is taken place in each pair.
3. A circuit between two driven elements of each pair is analyzed and a hold error evaluation value is determined so that the hold error evaluation value becomes larger with increase of the probability at which the hold error is taken place in each pair, a setup error evaluation value is determined so that the setup error evaluation value becomes larger with increase of the probability at which the setup error is taken place in each pair, and then the hold error evaluation value and the setup evaluation error are added together to create the evaluation value for each pair.
4. A physical distance and a circuit between two driven elements of each pair are analyzed and a distance evaluation value is determined so that the distance evaluation value becomes larger with increase in the physical distance for each pair and a hold error evaluation value is determined so that the hold error evaluation value becomes larger with increase of the probability at which the hold error is taken place in each pair, and then the distance evaluation value and the hold error evaluation value are added together to create the evaluation value for such pair
5. A physical distance and a circuit between two driven elements of each pair are analyzed and a distance evaluation value is determined so that the distance evaluation value becomes larger with increase in the physical distance for each pair and a setup error evaluation value is determined so that the setup error evaluation value becomes larger with increase of the probability at which the setup error is taken place in each pair, and then the distance evaluation value and the setup error evaluation value are added together to create the evaluation value for each pair.
6. A physical distance and a circuit between two driven elements of each pair are analyzed and a distance evaluation value is determined so that the distance evaluation value becomes larger with increase in the physical distance for each pair, a hold error evaluation value is determined so that the hold error Devaluation value becomes larger with increase of the probability at which the hold error is taken place in each pair, and a setup error evaluation value is determined so that the setup error evaluation value becomes larger with increase of the probability at which the setup error is taken place in each pair, and then the distance evaluation value, the hold error evaluation value, and the setup error evaluation value are added together to create the evaluation value for each pair.
Further, a buffer stage number adjusting step may be executed in such a manner that one or more buffer elements are inserted into the circuit so that each circuit path extending from the signal supplying source to all of the plurality of elements consists of the same number of cell stages.
On the other hand, according to another aspect of the present invention, there is provided an apparatus for optimizing signal distribution through signal lines which connect a signal supplying source and a plurality of elements which are supplied with a signal from the signal supplying source through the signal lines. The apparatus is utilized when designing a circuit including the signal supplying source and the plurality of elements. The apparatus includes a determining unit for determining whether or not the signal supplying source satisfies a fan-out restriction if the signal supplying source supplies a signal to all of the driven elements which are directly connected to the signal supplying source, a dividing unit for dividing the plurality of elements into a plural number of groups so that the fan-out restriction is satisfied in each of the groups and that each of the groups has the same or substantially the same load capacity when the determining unit determines that the signal supplying source does not satisfy the fan-out restriction, and a buffer inserting unit for inserting a buffer element having a size which makes the groups of elements satisfy the fan-out restriction, into each of the groups divided by the dividing unit, wherein the determining unit regards the buffer element inserted by the buffer inserting unit as a driven element and then determines whether or not the signal supplying source satisfies the fan-out restriction if the signal supplying source supplies a signal to all of the driven elements which are directly connected to the signal supplying source, and the dividing unit and the buffer inserting unit repeat the processing thereof until the determining unit delivers a determination that the signal supplying source satisfies the fan-out restriction.
At that time, the optimizing apparatus may further include an optimizing unit which carries out optimization in such a manner that, if the determining unit delivers a determination that the signal supplying source satisfies the fan-out restriction, the optimizing unit analyzes a delay taken place between the signal supplying source and each of the plurality of elements, and then optimizes the signal distribution through the signal lines by inserting or removing one or more buffer elements, or by changing the buffer size, or by changing the group to which the driven element belongs.
Moreover, the optimizing apparatus may further include an evaluation value calculating unit which carries out calculation in such a manner that, if the determining unit delivers a determination that the signal supplying source does not satisfy the fan-out restriction, then the evaluation value calculating unit calculates an evaluation value for each of the all pairs of the driven elements on the basis of a circuit performance enhancing factor. Thereafter, the dividing unit divides the driven elements into groups so that, of pairs satisfying the fan-out restriction, a pair having the largest evaluation value calculated by the evaluation value calculating unit is made to belong to the same group with priority.
Furthermore, the optimizing apparatus may further include a buffer stage number adjusting unit for inserting one or more buffer elements into the circuit so that each circuit path extending from the signal supplying source to all of the plurality of elements consists of the same number of cell stages.
According to still another aspect of the present invention, there is provided a recording medium having an optimizing program stored therein. The program is utilized when designing a circuit including a signal supplying source and a plurality of elements which are supplied with a signal from the signal supplying source through signal lines, and the program is executed by a computer for optimizing signal distribution through the signal lines connecting between the signal supplying source and each of the plurality of elements. The computer is made to carry out repeatedly the optimizing program including a procedure of determining whether or not the signal supplying source satisfies a fan-out restriction if the signal supplying source supplies a signal to all of the driven elements which are directly connected to the signal supplying source, a procedure of dividing the plurality of elements into a plural number of groups so that the fan-out restriction is satisfied in each of the groups and that each of the groups has the same or substantially the same load capacity, when it is determined in the determining procedure that the signal supplying source does not satisfy the fan-out restriction, and a procedure of inserting a buffer element having a size which makes the groups of elements satisfy the fan-out restriction, into each of the groups divided at the dividing procedure. When the computer carries out the determining procedure, the buffer element inserted at the buffer inserting procedure is regarded as a driven element, and then it is again determined whether or not the fan-out restriction is satisfied if the signal supplying source supplies a signal to all of the driven elements which are directly connected to the signal supplying source, and the computer repeatedly carries out the dividing procedure and the buffer inserting procedure until the determining procedure delivers a determination that the signal supplying source satisfies the fan-out restriction.
At this time, the optimizing program may cause the computer to execute an optimizing procedure in such a manner that if the determining procedure delivers a determination that the signal supplying source satisfies the fan-out restriction, then a delay taken place between the signal supplying source and each of the plurality of elements is analyzed, and the signal distribution through the signal lines is optimized by inserting or removing one or more buffer elements, or by changing the buffer size, or by changing the group to which the driven element belongs.
Further, at this time, the optimizing program may cause the computer to execute procedures in such a manner that, if the determining procedure delivers a determination that the signal supplying source does not satisfy the fan-out restriction, then an evaluation value calculating procedure is executed for each of the all pairs of the driven elements on the basis of a circuit performance improving factor, and the driven elements are divided into groups in the dividing procedure so that, of pairs satisfying the fan-out restriction, a pair having the largest evaluation value calculated at the evaluation value calculating procedure is made to belong to the same group with priority.
Further, the optimization program may cause the computer to execute a buffer stage number adjusting procedure in which one or more buffer elements are inserted into the circuit so that each circuit path extending from the signal supplying source to all of the elements consists of the same number of cell stages.
According to the above-described method of optimizing signal lines within a circuit, the optimizing apparatus and the recording medium having the optimizing program stored therein of the present invention, a plurality of elements or driven elements (buffer elements) are divided into groups (subset) so that each group has the same or substantially the same load capacity. Therefore, the tree-like wired signal lines can be arranged to have a balance in terms of signal distribution among these groups.
If the tree-like wired signal lines are created with balance as described above, and thereafter optimization of the signal distribution by the signal lines is effected, then it becomes possible to positively arrange the tree-like wired signal lines causing little skew with ease.
Further, according to the above-mentioned scheme of the present invention, the driven elements are divided into groups so that, of all pairs of driven elements satisfying the fan-out restriction, pairs having the largest evaluation value are made to belong to the same group with priority. Therefore, if signal lines of driven elements are divided (by inserting a buffer element), the buffering is effected while taking a circuit quality into consideration as well as satisfying the fan-out restriction. At this time, the tree-like signal lines can be synthesized so as to avoid the hold error or the setup error. In addition, when the signal lines of driven elements are subjected to division by inserting a buffer element, floorplan information or placement information (e.g., layout information, physical information) may also be taken into consideration, whereby the skew can be positively reduced with ease and wire length can be shortened in the stage of layout arrangement.
Furthermore, if one ore more buffer elements are inserted into the circuit so that each of the circuit paths extending from the signal supplying source to all of the plurality of elements consists of the same number of cell stages, then the skew caused by a plurality of elements can be more positively reduced.
As described above, according to the above-described method of optimizing signal lines within a circuit, the optimizing apparatus and the recording medium having the optimizing program stored therein of the present invention, the following effects or advantages can be obtained.
(1) When a signal is distributed through the tree-like arranged signal lines, since the groups of elements are branched by the tree-like wiring with a proper balance, delay of signals from the signal supplying source to each of the elements is optimized and skew can be positively reduced. Thus, it becomes possible to design a circuit which can cope with a demand in high speed processing.
(2) According to the above invention, the tree-like signal lines are arranged with a proper balance and thereafter signal distribution through signal lines is optimized. Thus, it becomes possible to positively arrange a tree-like signal lines causing a small skew with ease. Accordingly, delay of signals from the signal supplying source to each of the elements is positively optimized and skew can be more positively reduced.
(3) According to the present invention, the driven elements are divided into groups so that the circuit quality is maintained high in addition to that each group satisfies the fan-out restriction. Therefore, it becomes possible to create a signal line wiring in which the hold error or the setup error can be positively prevented from being caused. Thus, circuit quality can be dramatically increased. In addition, when the signal lines are wired, floorplan information or placement information is taken into consideration. Therefore, it becomes possible to positively suppress the skew with ease in the step of layout arrangement, with the result that the signal lines can be shortened and the signal distribution can be optimized.
(4) According to the present invention, one ore more buffer elements are inserted into the circuit so that each of the circuit paths extending from the signal supplying source to all of the plurality of elements consists of the same number of cell stages. Therefore, the skew causing by a plurality of elements can be more positively reduced and the signal distribution can be optimized.
On the other hand, according to the present invention, there are proposed a method of designing a circuit and also provided a recording medium having stored therein a program for designing a circuit in which contents of circuit design described in an RTL (Register Transfer Level) is synthesized in a logic level to produce a first netlist, a floorplan is tried to be made on the first netlist, the buffer cells are subjected to placement so as to respond to a clock signal based on the resulting floorplan information, a buffer stage number is examined on the tentative result of the buffer cell placement, and then adjustment is effected on the signal lines so that each signal line from the clock source to a flip-flop cell has the same buffer stage number.
With the above method, the clock buffer cells are placed with the floorplan information taken into account. Therefore, a circuit block of the floorplan remote from the clock supplying source can be prevented from undergoing the buffering arrangement, with the result that the wiring length of the clock path can be suppressed and also the propagation delay time of the clock signal can be suppressed. Further, since the wiring is subjected to a process of adjusting the number of buffer stages, the scattering in clock skew can also be suppressed.
In this case, upon the initial step of placement, cells are subjected to the placement processing based on the netlist and the floorplan information with the net on the clock tree disregarded. Thereafter, processing proceeds to a step in which the flip-flop cells which should be connected to the clock tree are allocated with their decided placement positions.
In this way, a cell and a flip-flop cell connected to any circuit block other than the clock signal net (hereinafter referred to as clock net) are initially subjected to an optimum placement, and then arrangement of the clock net is retried in accordance with the result of the placement. Accordingly, any circuit path other than the clock path can be free from unreasonable close placement, with the result that the wiring can be relieved from being crowded, cells can be placed while the clock net and other net are optimized in wiring efficiency, and time it takes for wiring processing can be shortened. In addition, data buses extending between the flip-flop cells are also placed optimally, and hence the cells can be free from timing error which can be caused after determining a layout.
Further, according to the method of designing a circuit of the present invention, the step for adjusting the number of buffer stage is followed by steps that a final stage net of the clock tree is extracted from the initial placement of the flip-flop cells, the net range of the extracted final stage net, a plurality of nets overlaying on one another in net range are merged together, the merged net is divided that the net range is made free from overlap, and buffer cells are allocated to the divided nets to carry out net reconstruction so as to avoid the net range overlap.
In this way, the clock skew caused at the final stage of the clock tree can be suppressed and the wiring can be relieved from being crowded.
Further, the step for arranging the clock layout includes a step in which a placement position of buffer cell""s on the clock tree is decided in a bottom-up manner based on the placement result of the flip-flop cells which has been determined in the step in which cells are subjected to the placement processing while the net on the clock tree is disregarded. The step for arranging the clock layout further includes a step in which cluster information for the flip-flop cells is created for each final stage buffer of the clock tree based on the placement,information of the flip-flop cells which are driven by a buffer, a step in which placement restriction area information is created with the cells other than the buffer cell centered, and a step in which placement processing is then retried on any cell other than the buffer cells while the retried placement restriction area information and the cluster information are utilized as a restriction value.
In this way, the flip-flop cells are placed so that flip-flop cells belonging to the same group are collected at every cluster group within the placement restriction area. Thus, the propagation delay time of a signal transmitting through the clock path can be shortened, the skew value can be suppressed, and a number of steps for adjusting the timing after fixing the layout can be reduced.
Furthermore, according to the method of designing a circuit of the present invention, after the step in which placement processing is retried on any cell other than the buffer cells while the retried placement restriction area information and the cluster information are utilized as a restriction value, the following steps are provided. That is, cluster information is created for each net of the clock tree from the placement position of the driven cell, the driven cells divided into clusters are connected with wiring to form a cluster wiring in each created cluster, and the driving cell and the clusters are connected by signal routes which extend in a radiating fashion.
In this way, wiring for supplying clock signals can be arranged so as to suppress the skew of clock. Also, the net connecting the buffer cells to each other can be relieved from crowded wiring.
Furthermore, according to the method of designing a circuit of the present invention, after the step in which the placement position of buffer cells on the clock tree is decided in a bottom-up manner, a step of reconstructing the clock tree is provided. In the step of reconstructing the clock tree, a logic changing command is generated to a netlist file upon changing the logic of clock, and a second netlist is changed based on the logic changing command. Thus, a third netlist is created in a step in such a manner that the third netlist contains a clock logic coincident with the clock logic of the layout database.
In this way, coincidence between the logic of the layout database and the logic of the netlist is maintained, with the result that a formal verification or the like can be utilized for guaranteeing the coincidence between the two kinds of logic systems even if the logic of the netlist is subjected to any modification.
As described above, according to the method of designing a circuit and the recording medium having stored therein the program for designing the circuit, the entire wiring can be efficiently arranged, and hence a time it takes for the circuit to carry out processing can be shortened.
Moreover, since the propagation delay time of the clock signal and the skew can be suppressed, probability of occurring timing error after fixing the layout can be reduced, and hence a time it takes for adjusting the timing of signal transmission can be shortened.
Further, according to the present invention, even if the circuit under designing contains a locally congested part in wiring, the congested part can be effectively removed, with the result that the entire wiring of the circuit will have a uniform allowance in wiring density. Thus, the circuit having been subjected to the process according to the present invention can be more densely packed and fabricated with a smaller size on a semiconductor chip.
Furthermore, the circuit having a fixed layout can be subjected to a high-speed verification tool such as a formal verifier for verifying the logic. Therefore, a time it takes for designing and implementing a circuit of an LSI or the like can be shortened.