This invention is related to the field of Design Automation of Very Large Scale Integrated (VLSI) circuit chips, and more particularly, to the determination of allowed placement regions within the chip for clocked elements in a design.
It is known in the art that clock tree generation is intended to create an electrical network to distribute a clock signal to clocked elements for use as a sequencing control for logical operations within a chip. A common type of clock distribution network is a buffered clock tree for which the clock tree generation determines how many clock buffers are needed, how they are to be connected, both to clocked circuit elements at the leaves of the clock tree and to other buffers at upper levels of the clock tree (i.e., levels closer to the root of the tree), and where they are to be placed. Practitioners in the art will fully recognize that a clock buffer in the present context may include any circuit that propagates a clock signal, specifically including inverting and non-inverting buffers, clock splitting, shaping circuits, clock gating circuits, and the like. Specific objectives of clock tree generation are to place clock buffers so that particular load and slew constraints for each net and sink in the clock tree are met that generate a clock tree in which the relative signal arrival times at clock tree leaf elements are as close as possible to the desired target (often this target is to have all sink arrival times coincide in order to achieve zero skew), and to achieve all these objectives with the minimum possible wiring resource usage, the minimum possible number of clock buffers, and the minimum possible root to leaves clock tree delay. It will be understood by those skilled in the art that the term clock tree leaf element in the present context refers to any circuit which is synchronized to a clock signal, and may include latches, flip-flops, and/or other memory elements. Clock tree leaf elements and clock buffers will collectively be referred to as clocked elements.
Typical methods of clock tree generation assume that the locations for clock tree leaf elements are fixed, having been previously determined by a placement program or other means, wherein the clock tree leaf elements do not move during the clock tree generation. However, it is desirable to cause clocked elements driven by a single net to be placed in close physical proximity in order to reduce the amount of wiring at the leaf level of the tree which typically accounts for most clock wiring. Because clock signals are typically the most frequently switching signals on an integrated circuit, reducing the wiring also limits the clock net capacitance which must be driven, and hence reduces power consumption, because the switching power consumed by a net on an integrated circuit is roughly proportional to the net capacitance times the switching frequency of the net. Such clustering also allows the clock buffers which control the clock tree leaf elements to drive a larger number of these elements without violating constraints (e.g., buffer load, slew, and the like). As a result, the number of clock buffers that are required is reduced, resulting in additional power reduction, with a similar benefit seen up to the root of the clock tree.
Although the term timing slack, hereinafter referred to slack, is commonly used to refer to results of computations performed in either early or late mode timing analysis, in the present context, slack will relate to late mode timing analysis. A slack at a point in an integrated circuit (e.g., a circuit element input pin) will refer to the amount by which a signal arrives at that point earlier than is required for a correct circuit operation. Thus, a negative slack indicates that a signal arrives too late, and the normal constraint on the slack is that it be greater than or equal to zero. Typically, a slack is computed as the difference between a required arrival time (RAT) at a point and an arrival time (AT) at the same point. The late mode AT of a point is an upper bound on the time at which the signal at the point will become stable and is computed by well-known forward propagation methods. The late mode RAT of a point is a lower bound on the time at which the signal at that point will be required to become stable in order to meet timing requirements, and is computed using well-known backward propagation methods. The late mode RAT at the data input of a storage element is typically computed from a setup time requirement between the data and clock for the storage element and the early mode clock AT of the storage element.
The concept of slack will now be illustrated with reference to FIG. 1. Latch 100 launches data on clock edge 150 which occurs at time zero, and a signal propagates therefrom to latch 140 through net 110, logic network 120 (any collection of interconnected logic gates), and net 130. The resulting signal is captured at latch 140 by clock edge 160 occurring one clock cycle later than clock edge 150. The cumulative delay from latch 100 to latch 140 is 0.2 ns+3.5 ns+0.3 ns, or 4.0 ns, which is also the AT of the data input of latch 140. If the setup time requirement at latch 140 is zero (i.e., the data input cannot arrive any later than the clock, but need not arrive earlier) and the clock cycle is 4.0 ns, the RAT of the data input of latch 140 is 4.0 ns, and the slack of the data input is 4.0 ns−4.0 ns, or 0.0 ns. The same cumulative delay is subtracted from the input of latch 140 RAT to produce a RAT of 0.0 ns at the output of latch 100; thus, the slack thereof is 0.0 ns−0.0 ns, or 0.0 ns. The fact that the slack at the input of latch 100 and the output of latch 140 are the same is not accidental; the slack at all points along any critical path (i.e., a path which imposes the most stringent timing requirements on all points along it) will always be the same. Had the cumulative delay of the path from 100 to 140 been greater or had the clock cycle been smaller, the slack would have been negative. Likewise, if the cumulative delay of the path from 100 to 140 had been smaller or if the clock cycle had been greater, the slack would have been positive. One way in which the cumulative delay can change is by increasing or decreasing the wire length of either net 110 or net 130, thereby changing the wire delay and capacitive load. Such change in wire length occurs by moving either latch 100 or 140, thus altering the distance between the moved latch and the logic gate in logic network 120 to which it is connected. Note that the situation in FIG. 1 is simplified, since there exist multiple paths from a typical storage element to other storage elements and multiple paths from other storage elements thereto. FIG. 1 also shows positive edge-triggered latches or flip-flops. The description can be extended to other commonly known types of clock tree leaf elements such as negative edge-triggered latches, master slave latches, and memory arrays.
One prior art method of clock tree generation which provides for clocked element movement is described in U.S. Pat. No. 6,609,228 to Bergeron et al. Therein is described the formation of logical clusters of clock tree leaf elements, (sets of clock tree leaf elements to be driven by a common clock net), and then adjustment of the location of the clocked elements in the cluster to reduce the amount of wire required in the net driving them. Movement of clock tree leaf elements may alter the timing of a circuit due to changes in wire length and, hence, changes in wire delay and capacitive load. These changes can cause violations of timing requirements to occur by making the slacks exceed a specified limit. Movement of clocked elements may also cause the number of circuits placed or the number of wires routed in a local region to exceed the capacity of that region, a condition known as placement or wiring congestion.
Clock tree leaf elements are often larger than clock buffers. Because each clock buffer drives many other clocked elements (clock buffers at upper levels of the clock tree or clock tree leaf elements at the leaves of the clock tree), the number of clock tree leaf elements is much larger than the number of clock buffers. Thus, the potential congestion caused by the movement of clock tree leaf elements generally exceeds that which results from the movement of clock buffers. However, Bergeron et al. do not describe using slacks or other forms of timing information to control the movement of clocked elements, nor do they describe the use of congestion information to control the movement of clocked elements.
Attempts have been made in the art to place clock buffers within a clock tree and to generate allowed placement regions for clock buffers in a buffered clock tree. These attempts are based on timing information such as capacitance, delay, and slew. Specifically, the placement region for a clock buffer is the area within which the clock buffers are placed without violating capacitance, delay or slew constraints. Such attempts, however, have not been applied to clock tree leaf elements, and present a distinct disadvantage in that they are not formed based on slack constraints and intersecting sub-regions.