This invention relates to static timing verification of integrated circuit designs.
Prototyping a VLSI (very large scale integrated circuit) design is extremely expensive: fabbing (fabricating) a pass of a prototype full-custom VLSI chip would take several months and would cost several hundred thousand dollars. If the chip design is flawed, the chip itself is almost impossible to probe to isolate the problem and determine corrections to the design. For this reason, virtually all VLSI chips are designed and thoroughly verified by software modelling before the first actual silicon is fabbed.
A timing verifier is one program in the suite of tools used by a VLSI designer. Timing verification is the process of analyzing the circuit model to ensure that the signals propagate through the logic quickly enough to meet the timing requirements at a specified clock frequency. (A timing verifier may also may additionally include other analysis tools, for instance for race conditions or other logic problems.) Once the circuit has been largely designed using other tools of the suite, the timing verifier is used to improve it, e.g., to eliminate bottlenecks that would force the circuit to be run at a slow clock frequency. The timing verifier takes as input a description of the circuit and its interconnections, the impedances and/or loading of the wires, specifications of the devices in the logic path, and descriptions of the clocked elements, and produces as output timing of the slowest paths, i.e., the "critical paths", from which the designer can deduce the maximum clock frequency at which the circuit can be run. The designer can then redesign the critical paths to speed them up, thus speeding up the entire circuit. This process is typically iterative: the designer runs the timing verifier, and modifies his circuit design using the information generated. He repeats this process until the number of critical paths with the same timing limit is so large that reducing the time of all of them becomes impractical.
In a synchronous integrated circuit (IC) design, major signals are captured in latches at clock edges and are held at stable values when and while the clock is deasserted. The value of the signal at the output of a latch, a latched signal, is only allowed to change during the time the clock signal is asserted. During the time the clock is asserted, changes on the D input to the latch immediately propagate through the latch to the Q output; thus the clock assertion is said to make the latch transparent. The latched signals propagate downstream through combinatorial logic to other latches. The timing verifier reports any latches (or other clocked element) whose inputs are not stable in time to meet the requirements of the latch's clock.
FIG. 1 depicts a simple illustrative circuit, which will be considered under a simplified model of timing constraints and design rules. Two input signals A 100 and B 102 are latched by latches 108 and 110. Thus, signals A' 112 and B' 114 are stable except when the two latches 108 and 110 are transparent, which occurs when clocks Ck.sub.A 104 and Ck.sub.B 106 are asserted. Once A' and B' have been latched, they remain stable, and combinatorial logic CL.sub.1 116, CL.sub.2 120, and CL.sub.3 122 compute signals Y 124 and Z 126. Each of CL.sub.1, CL.sub.2, and CL.sub.3 impose a certain delay in this computation. The downstream part of the design (not shown) relies on Y 124 and Z 126 being latched by latches 132 and 134 on clocks Ck.sub.Y 128, and Ck.sub.Z 130. Thus, CL.sub.1, CL.sub.2, and CL.sub.3 must be fast enough to meet the setup requirements of latches 132 and 134.
FIG. 2 presents a timing diagram for the circuit of FIG. 1. The first three lines show the clocks Ck.sub.A 104, Ck.sub.B 106, Ck.sub.Y 128, and Ck.sub.Z 130. In this example, A and B are latched on the same clock. Signals A and B must be stable far enough before the falling edge of Ck.sub.A /Ck.sub.B 206 to accommodate a "setup time" 208, a characteristic of latches 108 and 110. Once latches 108 and 110 become transparent during Ck.sub.A /Ck.sub.B 204, (assuming that the setup time and the data-to-output time of the latches are equal) signals A' and B' are allowed to transition until they are latched on the falling edge of Ck.sub.A /Ck.sub.B 206. A' and B' drive CL.sub.1, CL.sub.2, and CL.sub.3, which in turn produce signals X, Y, and Z. Under the simplified timing rules, the timing constraints of the circuit are satisfied if the propagation delay 208 of latch 108 plus the propagation delays through CL.sub.1 216 plus CL.sub.2 220 plus the setup time 232 of latch 132 is less than the time from the fall of clock Ck.sub.A /Ck.sub.B to the fall of clock Ck.sub.y 228, and if the propagation delay 208 of latch 110 plus the time delay through CL.sub.1 216 plus CL.sub.3 222 plus the setup time 234 of latch 134 is less than the time from the fall of clock Ck.sub.A /Ck.sub.B to the fall of clock Ck.sub.Z 230. The paths of A'-CL.sub.2 -Y and B'-CL.sub.3 -Z must also meet the timing requirements of latches 132 and 134, but these will be trivially satisfied because they are clearly faster than paths A'-CL.sub.1 -X-CL.sub.2 -Y and B'-CL.sub.1 -X-CL.sub.3 -Z. When all these conditions are satisfied, the circuit is said to pass timing verification.
If the circuit fails timing verification, the timing verifier will report the critical paths that failed. Either the logic on the slow paths needs to be redesigned to be faster, or the clock frequency needs to be slowed down to accommodate the timing of the circuit.
Timing verifiers operate on one of two general paradigms: dynamic or static.
In dynamic timing verification, the circuit design is simulated through time. The engineer must determine model input stimuli with which to drive the circuit model, called test vectors. Applying dynamic timing verification to the sample circuit to FIG. 1, the timing verifier would successively apply twelve stimuli where either A or B or both undergo transitions: AB.fwdarw.AB={00.fwdarw.01, 00.fwdarw.10, 00.fwdarw.11, 01.fwdarw.00, 01.fwdarw.10, 01.fwdarw.11, 10.fwdarw.00, 10.fwdarw.01, 10.fwdarw.11, 11.fwdarw.00, 11.fwdarw.01, 11.fwdarw.10 } and run a loop to simulate time, during which model clock Ck.sub.A /Ck.sub.B would undergo several transitions. The circuit model would be operated through time to see at what time signals Y and Z stabilize. Dynamic timing verification is effective in that it is capable of diagnosing all timing problems, at least for the test vectors applied. But in modern circuit designs, the super-exponential combinatorics on tens of thousands of signals is fatal to the dynamic approach: there simply isn't time to test all possible combinations of inputs (most of which would never arise in actual operation), nor for a human to filter out a set of meaningful test vectors that will test all the effective paths.
In the second paradigm, static analysis, there is no loop simulating the passage of time. Static analysis is to dynamic analysis as theorem proving is to case analysis: instead of attempting to simulate a "large enough" number of specific cases, a static timing verifier "reasons" about the circuit model and draws inferences about whether the circuit will meet its timing constraints. This generally involves analyzing every node--i.e., every wire--in a circuit and calculating transition times based on the arrival time of inputs and the propagation delay through the structures. As the times of the transitions of the inputs to a node are analyzed, only the latest transition (in time) is saved, and the algorithm immediately stops tracing any path that is known not to be the worst case. This process, called information pruning, is required to keep the execution times reasonable.
One known algorithm for static timing verification is a depth-first search (DFS) of the circuit starting at each signal guaranteed on a clock edge, labelling each node with the currently best-locally-known worst-case timing information. After all nodes have been labelled, a second pass examines all timing constraints to tell the designer whether the circuit as a whole meets its timing constraints.
Consider the circuit of FIG. 3, in which a first stage of the circuit has two paths of different delay times, which join at a multiplexer. The output of the multiplexer fans out in a second stage of two paths of different delay times, which are joined at a second multiplexer. The DFS algorithm represents each node of a circuit by a data structure as shown in FIG. 4. The node has a name, a "worst case arrival time," and a pointer to the node that drove this worst-case transition.
FIGS. 5a-e depict a DFS analysis of the circuit of FIG. 3: FIG. 5a shows a time-sequence of stack states, and FIGS. 5b-e show a time sequence of states of data structures.
In the DFS algorithm, the graph of the nodes of the circuit is walked in a depth-first order. The algorithm's walker maintains a current "arrival time," and a stack of nodes. (Since this is a static analyzer, note that the "arrival time" does not "tick" off time incrementally, it moves forward and back by the discrete amounts of delay of the logic walked.) The DFS walker pushes nodes onto the stack as it traces paths downstream, and pops them as it unwinds back upstream. The walker increments its arrival time as it walks downstream through logic by the time delay of the logic, and decrements it the same amount as it unwinds back. As the algorithm pushes each node, if the walker's arrival time is later than the current "worst case arrival time" (or simply ".time") of the node, then the node is updated with the value of the DFS arrival time, and the node's "worst case predecessor" (or simply ".predecessor") is pointed at the predecessor node down which the DFS walk came, and the DFS continues down the successor nodes. If the DFS arrival time is equal to or earlier than the current node's worst case arrival time, the probe of this path is abandoned, and the node is popped off the stack.
In FIG. 5a, each column depicts a step 300 identified by number, and the value of the DFS arrival time 302 during that step. The state of the DFS stack 304 is also shown, with the top of the stack in bold. The term "labelled" is used to describe information permanently (though overwritably) stored in the representation of the circuit. "Unvisited" is used in a local sense: a node is unvisited if it as not been visited via the current path, even if it has been previously visited via a different path.
step 1: FIG. 5b shows the configuration of the nodes for the circuit of FIG. 3 as the algorithm visits the first node of the circuit, node A 310. All the node names have been filled in. A.predecessor and A.time have been filled in (by the process about to be described in detail). PA0 step 2: Assume that A's list of successor nodes is ordered such that the algorithm visits C, then B. Thus, the algorithm walks to node C. Since the logic connecting A to C, CL.sub.2, consumes 11ns, the DFS algorithm carries the arrival time 12 as it arrives at C. The algorithm, finding C not already labelled, labels C.time with 12 and points C.predecessor to A. PA0 step 3: The only successor of C is D, through logic consuming 1ns, so the algorithm proceeds to D and sets D.time 13 and points D.predecessor to C. Assume that D's list of successor nodes is ordered such that the algorithm visits node E, then F. PA0 step 4: Node E is filled in with time 26 and predecessor D. PA0 step 5: Node G is filled in with time 29 and predecessor E. The walk would continue downstream from node G. PA0 step 6: DFS pops its stack to back E. E has no unvisited successors. PA0 step 7: DFS pops its stack back to D. D has an unvisited successor, F. PA0 step 8: Node F is filled in with time 32 and predecessor D. PA0 step 9: When DFS arrives at node G with arrival time 33, it finds the node already labelled, but with a time earlier than the current DFS arrival time. Thus, G is updated with time 33, and G.predecessor is updated to point to node F. Note that pointing G.predecessor from E to F "prunes" from the graph all analysis downstream of E that was computed between steps 5 and 6. The algorithm has proved that E cannot possibly be on the critical path to G nor any node downstream of G. Because G has been relabelled, the nodes downstream of G must be walked again to have their times updated. PA0 step 10: DFS pops its stack back to node F. PA0 step 11: DFS pops its stack back to node D. D has no unvisited successors. PA0 step 12: DFS pops its stack back to node C. PA0 step 13: DFS pops its stack back to node A. The next unvisited successor of A is B. PA0 step 14: B is labelled with time 8 and predecessor A. PA0 step 15: DFS arrives at node D with arrival time 9. The arrival time is earlier than the current time of node D; thus, the algorithm stops probing along this path: all paths downstream of node D through node B are also said to be "pruned." By the same reasoning used in step 9, the algorithm has proved that the critical path to all nodes downstream of D must pass through C, not B. PA0 step 16: DFS pops its stack back to node B. PA0 step 17: DFS pops its stack back to node A. Node A now has no unvisited successors.
The intermediate state after step 5 is shown in FIG. 5c. The "worst-case arrival times" 322 have been filled in with a preliminary estimate of the latest transition time. The .predecessor pointers 320 show a preliminary estimate of the critical path to G, A-C-D-E-G. After the algorithm has visited all downstream logic and popped its stack to G:
The intermediate state after step 9 is shown in FIG. 5d.
Finding no unvisited successors of A, the DFS algorithm is complete. The result of the algorithm is the critical path graph of FIG. 5e. For instance, the critical path to node G can be discovered by tracing the .predecessor pointers from a node; e.g., the critical path to G is seen to be A-C-D-F-G. The critical path graph will be of the form of a forest of trees, each tree rooted at one of the input nodes or interior latches. Paths B-D and E-G have been pruned; no larger path that would have used these paths will be analyzed.
There may be multiple critical path graphs built for a single circuit, for instance one for a rising clock edge and one for a falling edge. Each node will have at most a single out-edge pointing to the latest-transitioning driver node for the given clock edge (or to one of several equally-late transitioning). The critical path graphs superimpose without effect on each other. Without loss of generality, the disclosure will discuss single critical path graphs.
Once the timing verifier has identified the critical path to every node, the designer will redesign parts of the circuit to speed up the logic on the critical path, and then run the timing verifier again. If the designer successfully speeds up a structure on the critical path, subsequent runs of the timing verifier on the altered circuit will very likely produce a different critical path graph.
Pruning is essential to making static analysis practical. A naive DFS walk of a circuit would take time exponential in the number of edges between the nodes of the circuit. Though it is possible to construct artificial examples in which DFS algorithms, even with pruning, exhibit exponential time complexity, in practice pruning reduces the time complexity from exponential to nearly linear. With pruning, a single run of DFS and violation sorting of a full microprocessor design can take about fifteen CPU minutes. Without pruning, such analysis would be infeasible.
In dynamic timing verification, the designer creates test vectors that exercise active, meaningful paths of the circuit. Static timing verification, by its nature, ignores the designer's logical intent and tests all paths through the circuit. The blessing of this more complete coverage is also a curse: unused, meaningless paths that will never be exercised during actual use of the circuit are also tested.
Consider again the circuit of FIG. 3. All further discussion in this disclosure will assume that X and Y are mutually exclusive, rendering the path A-C-D-F-G logically impossible. This could occur for a number of reasons, including:
1. dependencies in the logic generating nodes X and Y, PA1 2. architectural constraints affecting X and Y, PA1 3. logic A, B, E, and F is shared for differing functions. PA1 1. Inaccurate timing estimates for logic containing false paths. The timing verifier can generate overly-optimistic or overly-pessimistic estimates. PA1 2. Pruning eliminates the edges forming the real critical path; the real critical path is left unanalyzed. PA1 1. A manual step introduces risk that incorrect timing will be specified, PA1 2. A user may not have an accurate timing estimate for the node needing the timing, PA1 3. If the circuit design is modified, the manually-specified timing estimate must be manually updated. PA1 4. The manually-specified timing estimate will be invalid if the cycle time is varied.
Proper timing verification of this circuit should discover that the worst-case timing of node G is 29ns along path A-B-D-F-G instead of 33ns along A-C-D-F-G. The path A-C-D-F-G is called a "false path" or "logic exclusivity." However, the known DFS with standard pruning will prune node D such that only the C-D path remains (i.e., the B-D path is pruned away). Because node D does not have any path data structures pointing to node B, the true critical path A-B-D-F-G will not be generated by the tool. This can result in a number of undesired results including:
Existing timing verifiers handle false paths by requiring the user to manually insert timing estimates on the false path portion of the circuit, by executing multiple verification passes of the circuit or by manually annotating them and eliminating them altogether.
In manual insertion of timing estimates, the user specifies either absolute or relative timing estimates for nodes that are part of a false path. For instance, in the example, the user would annotate node F with time 32. The manual timing estimates method has the following charactersistics:
In another known technique, called "case analysis," the user specifies the valid combinations of certain input nodes. The verifier then uses these values to perform multiple passes through the network, much in the manner of dynamic timing verification. In the example of FIG. 4, if nodes X and Y are known to be exclusive, then a case analysis approach would call for three separate passes to be performed on the circuit: one with X deasserted and Y asserted, one with X asserted and Y deasserted and finally one with both X and Y deasserted. This does indeed allow the verifier to ignore the false path from A to F. However, 2n-3n passes of analysis are required, where n is the number of false path relationships in the network. Case analysis increases exposure to exponential time complexity. The output from the multiple runs must be merged into a single result.
In yet another prior art technique, the circuit designer specifically annotates his circuit model with information indicating the false paths, directing the timing verifier to ignore those paths in its analysis. For instance, for the circuit of FIG. 3 (with the assumption that X and Y are mutually exclusive), the designer might add the statement EQU path.sub.-- logically.sub.-- impossible C,F
to his model, indicating that all paths that include both C and F are false paths. In this technique, an additional check is added as each node is pushed onto the DFS stack: the algorithm checks that the node is not the downstream node of any "path.sub.-- logically.sub.-- impossible" pair. If it is, and the upstream node of the pair is on the stack, then the path is abandoned.
Referring to FIG. 6a and applying this algorithm to the same example circuit of FIG. 3, steps 1-7 proceed exactly as in steps 1-7 of FIG. 5a-5c. In step 8, the DFS finds that the current node, F, is indeed the downstream node of an impossible path, and that the upstream node of the path, C, is on the stack. The algorithm abandons the walk and pops node F--having visited F without processing it, and without labelling nodes F and G with the edges 330 of FIGS. 5d and 5e. The visit of node G of step 9 of FIG. 5a never occurs. Steps 9-15 of the modified algorithm proceed exactly as in steps 11-17 of FIG. 5a. The result is the critical path graph of FIG. 6b. Node G has been labelled with time 27, but node F appears unvisited: no critical path to F has been calculated. Worse, the algorithm has failed to find the real critical path of the circuit, A-B-D-F-G. The too-optimistic timing estimate on node G, 27, masks the real worst-case arrival time of 29. This error may lead to catastrophic failure of the circuit.