The present invention pertains generally to integrated circuits, and more particularly, to a method for accurately analyzing clock timing in a point-to-point manner in a clock tree.
Clock networks on CMOS integrated circuits have long been a source of difficulty to integrated circuit designers due to the importance of minimizing skew between clock inputs. A typical integrated circuit includes a clock tree which distributes one or more clock signals throughout the chip to clocked elements. A primary goal of a clock tree is to minimize clock skew between clocked elements. Since all clocked elements on a given tree are driven from one net with a clock spine, skew is caused by differing interconnect lengths and loads.
Clock skew represents a fraction of the clock period that cannot be used for computation. A clock skew of 500 ps with a 200 MHz clock means that 500 ps of every 5 ns clock cycle, or 10 percent of the performance is wasted. That is, clock skew may reduce the time allowed for certain logic paths within the design, and thus may reduce the performance of the design. Thus, for high performance designs that have strict timing requirements, it is often critical to minimize clock skew.
FIG. 1 is a diagram illustrating a reduction in the effective clock period between registers caused by clock skew. An illustrative timing path is shown at 10, and a corresponding timing diagram therefore is shown at 30. The timing path includes a first rising edge triggered register 22a receiving data Da from a first input/output pad 20a, and a second rising edge triggered register 22b receiving data Db from a second input/output pad 20b. The first register 22a is clocked by a first clock signal CLKa and the second register 22b is clocked by a second clock signal CLKb.
With reference to the timing diagram 30, the input clock CLK is shown at 24. The first clock signal CLKa and the second clock signal CLKb are generated from the input clock signal CLK 24 via a clock tree or the like. The timing diagram 30 shows that the first clock signal CLKa is skewed relative to the second clock signal CLKb, as shown by tskew. A large clock skew tskew may be caused by an improperly designed clock tree.
On the rising edge of the first clock signal CLKa, the first register 22a may transfer data Qa from input Da. On the rising edge of the second clock signal CLKb, the second register 22b may transfer data Qb from input Db. When the subsequent logic (not shown) is designed to receive and use the latched data Qa and Qb simultaneously, the clock skew tskew is clearly problematic.
Because of the clock skew tskew between the first and second clock signals CLKa and CLKb, the effective clock period Teff between the rising edges CLKa and CLKb is less than the clock period Tperiod. This effectively reduces the time allowed for the data to pass through subsequent logic before receiving the next incoming data, and thus may reduce the effective maximum frequency of the circuit.
Clock skew may have a number of other detrimental effects on the performance of a circuit design. For example, clock skew may cause hold time violations when only a small amount of logic is provided between registers causing malfunction of the circuit. Further, clock skew may cause communication problems between integrated circuits. It should be recognized that these are only illustrative examples of effects that clock skew may have on a system.
For the above reasons, a primary goal of a clock tree is to minimize clock skew between clocked elements. As shown above, clock skew may reduce the effective clock period for certain logic paths within the design, and thus may reduce the performance of the design. For high performance designs that have strict timing requirements, clock skew may consume a substantial portion of the total clock period.
Clock trees may be balanced or unbalanced. Balanced clock trees distribute a number of clock drivers symmetrically and evenly placed on the integrated circuit die. In a balanced tree, the distance between each clock driver and its receiving element is preferably identical, and the load on each driver is matched. Balanced clock trees find suitable application in integrated circuits that are formed with functional blocks characterized by substantially similar loads, for example, memory chips formed with symmetrically balanced memory arrays.
By contrast, unbalanced clock trees distribute clock drivers in a non-symmetrical manner throughout the integrated circuit, generally with higher concentrations of clock drivers where the load is higher and lower concentrations of clock drivers where the load is smaller. Unbalanced clock trees are often utilized in complex circuits that are designed in a functionally hierarchical manner using a plurality of different functional blocks of differing loads. Unbalanced clock trees are typically used in integrated circuits that are partitioned into different functional blocks (and possibly to be designed by different groups of designers).
In the design cycle of a chip with a balanced design, the clock network is typically pre-placed on the integrated circuit die prior to placement of functional logic blocks. This scheme has a number of limitations. First, the clock buffering circuit may interfere with ideal block placement on the chip. This means that area or timing may need to be sacrificed. Second, any smaller blocks than the average will have a larger clock driver than is needed, possibly increasing the amount of power required. This scheme may waste chip resources.
In the design cycle of a chip with an unbalanced design, the clock network is normally added after determining where the appropriate buffers need to be located. This scheme also has a number of limitations. First, it prevents simulation of the clock network until all layers of the hierarchy are complete. This means that a parent block made up of one or more children blocks cannot be simulated until all of its children blocks are complete. As a result, the entire design must be complete before simulation can occur. If, as a result of simulation, it is discovered that one or more clock routes must be adjusted to meet the clock skew requirements, the final artwork is delayed until the layer(s) requiring adjustment are reworked, and the entire adjusted artwork is resimulated. This scheme is clearly time-consuming and costly.
Second, as process generations have advanced, the contributions of parasitic resistance and capacitance of the clock tree routing traces has become a significant portion of the overall clock skew. In order to properly analyze and simulate the performance of the clock network, the clock skew must be determined with sufficient accuracy such that it matches that of the actual design within predetermined error limits. The accuracy level of current circuit simulation tools is typically dependent on the size of the circuit to be analyzed. In other words, a relatively simple circuit with ten to a few hundred nets may be simulated with fairly high accuracy; in contrast, the accuracy level decreases when the number of nets is increased to thousands (or much higher numbers) of nets.
Accordingly, a need exists for an improved method for accurately analyzing and simulating clock performance of a clock network in a functionally hierarchically designed integrated circuit.
The present invention is a method and apparatus for accurately analyzing the timing of an unbalanced clock network on a piecemeal basis in an integrated circuit clock tree. The present invention allows an entire integrated circuit clock network to be accurately analyzed and simulated on a functional-level basis without requiring higher-level functions to be completed.
The invention applies to functionally hierarchical integrated circuit designs wherein the functionality of the chip is partitioned into different functional blocks located on different functionality levels. More particularly, the internals of certain functional blocks may be implemented on a child level, and the child functional blocks may be used in higher-level functional blocks on a parent level.
In accordance with the invention, during design, the designer of a child functional block associates tags with physical nodes in the child""s clock network where connections are to be made to the parent level. The clock network for each child block connecting to a particular parent level is extracted, and the child block tags that specify connection to the parent-level network are stored.
The designer of the parent block associates tags with physical nodes in the parent network where connections are to be made to the child blocks. The parent network, including any peripheral routing desired to minimize or accurately model parasitics, is extracted and again, the parent block tags are stored. The designer maps the parent block tags to the appropriate child block tags where actual electrical connections are to be made.
A simulation tool may then match up the tags between the parent blocks and the child blocks, in accordance with the mapping, to electrically connect the parent network nodes associated with the parent tags to child network nodes associated with the child tags. This allows simulation of the parent level with the clock networks of the children blocks included.
The process may be repeated for each child-to-parent relationship in the functionality hierarchy.
Because the invention allows subcircuits of an integrated circuit to be analyzed and simulated separately and then combined with parent circuits during a later analysis, simulation can be completed earlier in the design timeline. In addition, because the invention allows the entire network to be simulated, better accuracy is available. Furthermore, any tuning that is required to improve clock distribution may be completed much more quickly as data is available without having to recomplete top level extraction.