1. Field of The Invention
The invention relates to a system and method for integrated circuit design, and more particularly to a system and method for inserting a clock tree in an integrated circuit design.
2. Description of the Related Art
A standard cell-based integrated circuit is designed using a library of building blocks, known as xe2x80x9cstandard cells.xe2x80x9d Standard cells include such elements as buffers, logic gates, registers, multiplexers, and other logic circuits (xe2x80x9cMacrosxe2x80x9d).
FIG. 1 shows a typical design process or xe2x80x9cdesign flowxe2x80x9d 100 that an integrated circuit designer would use to design a standard cell-based integrated circuit. Referring to FIG. 1, the designer provides a functional or behavioral description (101) of the integrated circuit design using a hardware description language (HDL). In addition, the designer specifies timing and other performance constraints which the integrated circuit design must comply. The designer also selects a standard cell library to implement the design. Typically, the standard cells in the library are designed to the requirements of a target integrated circuit fabrication technology. Often, each cell is also characterized in the library to provide performance parametric values such as delay, input capacitance and output drive strength.
At step 102, the designer uses a xe2x80x9csynthesis toolxe2x80x9d to create from the HDL description 101 a functionally equivalent logic gate-level circuit description known as a xe2x80x9cnetlistxe2x80x9d (103). The elements of the netlist are instances of standard cells selected by the synthesis tool from the standard cell library in accordance with functional requirements and the performance constraints.
Next, a place and route tool is used to create a xe2x80x9cphysical designxe2x80x9d based on the gate-level netlist (103). The place and route tool uses a physical library 104 containing the physical design of the standard cells in the standard cell library. In operation, the place and route tool places the standard cell instances of the netlist onto the xe2x80x9csilicon real estatexe2x80x9d and routes conductor traces (xe2x80x9cwiresxe2x80x9d) among these standard cell instances to provide for interconnection. Typically, the placement and routing of these standard cell instances are guided by cost functions, which minimize wiring lengths and the area requirements of the resulting integrated circuit.
At step 105, an initial placement of the integrated circuit design is performed and a placement file 106 is generated containing the placement information of all standard cell instances of the design. In design flow 100, after the initial placement, certain pre-route optimization is performed to ensure that the current placement meets the timing constraints imposed by the design (step 107). Physical optimization operates by recursively performing timing analysis, detecting timing violations and performing corrections (such as by introducing delays or by speeding up a signal path). The physical optimization tasks generally include correcting maximum delay violations and minimum delay violations. After the physical optimization is completed, a modified netlist 108 and a modified placement file 109 are generated.
Then, at step 110, a clock tree for the integrated circuit design is created and inserted into the design. Most integrated circuit designs, such as those employing sequential logic, are driven by one or more clock signals. In the functional or behavior description of the design, the clock signal is merely represented as a wire distributing the clock signal from a clock input terminal to all nodes within the integrated circuit design receiving the clock signal. In the present description, nodes within an integrated design driven by the clock signal is referred to as xe2x80x9cclock signal endpointsxe2x80x9d or xe2x80x9cclock endpoints.xe2x80x9d A clock endpoint is typically an electrical terminal or a xe2x80x9cpinxe2x80x9d of a standard cell instance. The clock tree insertion step (110) operates to transform the wire representing the clock signal into a buffer tree so that the clock signal from the input terminal can drive all endpoints within the timing constraints of the design. The clock tree insertion step generates a modified netlist 112 including the buffers of the clock tree and a modified placement file 113 including the placement information of the buffers in the clock tree.
After physical optimization is performed and the clock tree is inserted, the placement of the integrated circuit can be legalized. Then, at step 114, the design can be routed so that all standard cell instances, including the clock tree, are connected with conductor traces (wires). Subsequently, a design verification step 115 is carried to ensure that the design meets the timing constraints specified for the overall design For instances, with the wires of the integrated circuit routed, a more accurate set of parasitic impedance values in the wires can be extracted. Using the extracted parasitic impedance values, a more accurate timing analysis can be run at step 115 using a static timing analyzer (STA). If the physical design meets timing constraints, the design process is complete. Otherwise, steps 105 to 114 are repeated after appropriate modifications are made to the netlist and the performance constraints.
As described above, the clock tree insertion step operates to transform the wire carrying the clock signal into a buffer tree propagating the clock signal from the clock input terminal throughout the design subject to certain predefined timing constraints. The timing constraints basically ensure that all clock signals arrive at about the same time at different nodes of the integrated circuit receiving the clock signal. In general, timing constraints for a clock tree include the maximum and minimum insertion delay time, the clock skew and the clock transition time.
Techniques for constructing a clock tree are well known. The prevalent method used in integrated circuit design is the construction of an xe2x80x9cH-Tree.xe2x80x9d FIG. 2 illustrates an exemplary H-Tree in an integrated circuit for distributing the clock signal. The principle behind constructing an H-tree is to distribute the clock signal so as to balance the loading of the clock tree. Referring to FIG. 2, an integrated circuit 118 is shown including multiple number of clock signal endpoints scattered throughout the integrated circuit. For example, an endpoint 123 denotes one of the many clock endpoints of integrated circuit 118. FIG. 2 is an abstract representation of integrated circuit 118 and is provided to illustrate the positions of the clock endpoints in the integrated circuit. As mentioned above, an endpoint of a clock signal is the electrical terminal or the pin of a standard cell instance receiving the clock signal.
The clock signal is coupled to integrated circuit 118 through a root node. In FIG. 2, an H-tree 120 is constructed connecting the clock signal from the root node to the clock endpoints. Typical H-tree construction starts by dividing the integrated circuit into regions, each region containing a number of endpoints. In FIG. 2, four regions are defined. Then, an approximate center of each region is determined and the center is used as a point for buffer insertion. For example, a buffer insertion point 124 in a region 122 (the lower-right region) of integrated circuit 118 is identified. Then, each region is further divided and the approximate center is identified to define buffer insertion points at the next level of the H-tree. For example, a buffer insertion point 126 is identified for a sub-region within region 122. H-tree 120 can be recursively refined to a required level in order to drive all endpoints within the predefined timing constraints.
The benefits of using an H-tree for clock distribution is that, by recursively building the H-tree, the same wire distance can be maintained between the root node to any of the endpoints. When distance is used as a proxy for load capacitance, equal distance means equal load capacitance at each endpoint. Because insertion delay of the clock signal at any endpoint is directly proportional to load capacitance, the H-tree is constructed so that the clock signal delay to any of the endpoints is approximately the same. In this manner, the H-tree methodology constructs a clock tree meeting the timing constraints.
In the construction of the H-tree, the same buffer is used at each buffer insertion point to ensure balanced loading. Thus, another benefit of the H-Tree is that the integrated circuit design tends to be more stable across fabrication process variations and operational environment variations (such as temperature) because the same buffers are used.
However, the H-tree methodology for constructing a clock tree has several disadvantages. First, it is difficult to construct an H-tree to balance the loading between a region with dense endpoints and a region with sparse endpoints. Often, in an effort to achieve balanced load through balanced distance, the H-tree methodology may unnecessarily add extra loading to the sparse regions. The extra loading effectively increases the total loading of the clock tree, creating a clock tree that is xe2x80x9clargerxe2x80x9d than necessary.
Referring to FIG. 2, region 122 of integrated circuit 118 may be a sparse region containing few clock signal endpoints. On the other hand, a region 121 above region 123 may be a dense region containing many more clock signal endpoints. Because the H-tree is optimized to achieve balanced load by balancing the wire distance, the same size and same amount of buffers will be used to drive endpoints in both the dense and the sparse regions. However, in the dense region, the buffers need to drive a large number of endpoints while in the sparse region, the buffers only need to drive a small number of endpoints.
FIGS. 3a and 3b illustrate the situations when an H-tree is used to drive endpoints in a dense region and in a sparse region. In FIG. 3a, a buffer 132a is in a dense region and thus has to drive a large number of endpoints, represented by a capacitor Clarge. In FIG. 3b, a buffer 132b, same type of buffer as buffer 132a, is in a sparse region and thus has to drive only a small number of endpoints, represented by a capacitor Csmall. When Clarge is much greater than Csmall, the H-tree is not balanced because the same buffers (132a and 132b) are driving different loads. The common solution to the dense/sparse regions problem in constructing an H-tree is to add dummy load to buffers in the sparse region so that the clock tree is balanced. Referring to FIG. 3b, a dummy load, represented by capacitor Cdummy is added in parallel to capacitor Csmall so that the total capacitance of the two capacitors equals the capacitance of Clarge.
Because of the addition of the dummy load, the clock tree is made larger for driving a larger load created merely for the purpose of balancing the loading of the clock tree. As a result, the clock tree tends to be slower because the clock tree has to drive a large amount of load. Thus, the H-tree methodology trades off clock insertion delay for the entire tree in order to gain a clock tree with balanced load. Furthermore, a larger clock tree requires more silicon area to implement, resulting in increased manufacturing cost.
Second, balancing the load does not always imply balancing the insertion delay of the clock signal. The H-tree methodology assumes a linear, proportional relationship between wire distance and load. However, a small change in wire distance may translate into a large change in load capacitance. Therefore, by using wire distance as proxy for loading in constructing the clock tree, unpredictable clock signal delays may result.
As integrated circuit dimensions continue to shrink, the aforementioned disadvantages and tradeoffs in clock tree constructions become unacceptable. Therefore, it is desirable to provide an improved method for clock tree construction which can avoid the aforementioned deficiencies so that a clock tree can be constructed and optimized to meet timing constraints.
A method for optimal driver selection uses a cost function that is based on the non-linear delay characteristics and the stage gain of the candidate drivers. The cost function operates to select an optimal driver for driving the predetermined capacitive load which simultaneously minimizes the delay and the amount of input capacitance introduced.
In one embodiment, a method for selecting a first driver for driving a load capacitance from a group of drivers includes: computing, for each driver in the group of. drivers, a cost based on a cost function associated with the driver for driving the load capacitance, and selecting the driver having the smallest cost as the first driver. The cost function is directly proportional to a delay of the driver and inversely proportional to the logarithm of a stage gain of the driver.
In another embodiment, the stage gain is an output capacitance driven by the driver divided by an input capacitance of the driver, where the output capacitance is the load capacitance.
The present invention is better understood upon consideration of the detailed description below and the accompanying drawings.