This invention relates to timing verification of circuits.
Prototyping a VLSI (very large scale integrated circuit) design, for example, is extremely expensive: fabbing (fabricating) a pass of a prototype full-custom VLSI chip may take several months and may cost several hundred thousand dollars. If the chip design is flawed, the chip itself is almost impossible to probe to isolate the problem and determine corrections to the design. For this reason, virtually all VLSI chips are designed and thoroughly verified by software modelling before the first actual silicon is fabbed.
A timing verifier is one program in the suite of software tools used by a VLSI designer to verify a design. Timing verification is the process of analyzing the circuit model to ensure that signals will propagate through the logic quickly enough to meet timing requirements at a specified clock frequency. (A timing verifier may also include other functions, for instance analysis for race conditions or other logic problems.) Once the circuit has been largely designed using other tools of the suite, the timing verifier is used to improve the circuit by, e.g., eliminating bottlenecks that would force the circuit to be run at a slow clock frequency.
The timing verifier takes as input a description of the circuit and its interconnections, the impedances and/or loading of the wires, specifications of the devices in the logic path, and descriptions of the clocked elements, and produces as output the timing of the slowest paths, i.e., the "critical paths", from which the designer can deduce the maximum clock frequency at which the circuit can be run. The designer can then redesign the critical paths to speed them up, thus speeding up the entire circuit. This process is typically iterative: the designer runs the timing verifier, and modifies his circuit design using the information generated. He repeats this process until the number of critical paths with the same timing limit is so large that reducing the time of all of them becomes impractical.
In a synchronous integrated circuit (IC) design, major signals are captured in latches at clock edges and are held at stable values when and while the clock is deasserted. The value of the signal at the output of a latch, a latched signal, is only allowed to change during the time the clock signal is asserted. During the time the clock is asserted, changes on the D input to the latch immediately propagate through the latch to the Q output; thus the clock assertion is said to make the latch transparent. The latched signals then propagate downstream through combinatorial logic to other latches. The timing verifier reports any latches (or other clocked element) whose inputs are not stable soon enough in time to meet the requirements of the latch's clock.
FIG. 1 depicts a simple illustrative circuit, which will be considered under a simplified model of timing constraints and design rules. Two input signals A 100 and B 102 are latched by latches 108 and 110. Thus, signals A' 112 and B' 114 are stable except when the two latches 108 and 110 are transparent, which occurs when clocks Ck.sub.A 104 and Ck.sub.B 106 are asserted. Once A' and B' have been latched, they remain stable, and combinatorial logic CL.sub.1 116, CL.sub.2 120, and CL.sub.3 122 compute signals Y 124 and Z 126. Each of CL.sub.1, CL.sub.2, and CL.sub.3 impose a certain delay in this computation. The downstream part of the design (not shown) relies on Y 124 and Z 126 being latched by latches 132 and 134 on clocks Ck.sub.Y 128, and Ck.sub.Z 130. Thus, CL.sub.1, CL.sub.2, and CL.sub.3 must be fast enough to meet the setup requirements of latches 132 and 134.
FIG. 2 presents a timing diagram for the circuit of FIG. 1. The first three lines show the clocks Ck.sub.A 104, Ck.sub.B 106, Ck.sub.Y 128, and Ck.sub.Z 130. In this example, A and B are latched on the same clock. Signals A and B must be stable far enough before the falling edge of Ck.sub.A /Ck.sub.B 206 to accommodate a "setup time" 208, a characteristic of latches 108 and 110. Once latches 108 and 110 become transparent during Ck.sub.A /Ck.sub.B 204, (assuming that the setup time and the data-to-output time of the latches are equal) signals A' and B' are allowed to transition until they are latched on the falling edge of Ck.sub.A /Ck.sub.B 206. A' and B' drive CL.sub.1, CL.sub.2, and CL.sub.3, which in turn produce signals X, Y, and Z. Under the simplified timing rules, the timing constraints of the circuit are satisfied if the propagation delay 208 of latch 108 plus the propagation delays through CL.sub.1 216 plus CL.sub.2 220 plus the setup time 232 of latch 132 is less than the time from the fall of clock Ck.sub.A /Ck.sub.B to the fall of clock Ck.sub.Y 228, and if the propagation delay 208 of latch 110 plus the time delay through CL.sub.1 216 plus CL.sub.3 222 plus the setup time 234 of latch 134 is less than the time from the fall of clock Ck.sub.A /Ck.sub.B to the fall of clock Ck.sub.Z 230. The paths of A'-CL.sub.2 -Y and B'-CL.sub.3 -Z must also meet the timing requirements of latches 132 and 134, but these will be trivially satisfied because they are clearly faster than paths A'-CL.sub.1 X-CL.sub.2 -Y and B'-CL.sub.1 -X--CL.sub.3 -Z. When all these conditions are satisfied, the circuit is said to pass timing verification.
If the circuit fails timing verification, the timing verifier will report the critical paths that failed. This indicates that either the logic on the slow paths needs to be redesigned to be faster, or the clock frequency needs to be slowed down to accommodate the timing of the circuit.
Timing verifiers operate on one of two general paradigms: dynamic or static.
In dynamic timing verification, the circuit design is simulated through time. The engineer must determine model input stimuli with which to drive the circuit model, called test vectors. Applying dynamic timing verification to the sample circuit of FIG. 1, the timing verifier would successively apply twelve stimuli where either A or B or both undergo transitions: AB.fwdarw.AB={00.fwdarw.01, 00.fwdarw.10, 00.fwdarw.11, 01.fwdarw.00, 01.fwdarw.10, 01.fwdarw.11, 10.fwdarw.00, 10.fwdarw.01, 10.fwdarw.11, 11.fwdarw.00, 11.fwdarw.01, 11.fwdarw.10} and run a loop to simulate time, during which model clock Ck.sub.A /Ck.sub.B would undergo several transitions. The circuit model would be operated through time to see at what time signals Y and Z stabilize. Dynamic timing verification is effective in that it is capable of diagnosing all timing problems, at least for the test vectors applied. But in modern circuit designs, the super-exponential combinatorics on tens of thousands of signals is fatal to the dynamic approach: there simply isn't time for a program to test all possible combinations of inputs (most of which would never arise in actual operation), nor for a human to filter out a set of meaningful test vectors that will test all the effective paths.
In the second paradigm, static analysis, there is no loop simulating the passage of time. Static analysis is to dynamic analysis as theorem proving is to case analysis: instead of attempting to simulate a "large enough" number of specific cases, a static timing verifier "reasons" about the circuit model and draws inferences about whether the circuit will meet its timing constraints. This generally involves analyzing every node--i.e., every wire--in a circuit and calculating transition times based on the arrival time of inputs and the propagation delay through the structures. As the times of the transitions of the inputs to a node are analyzed, only the latest transition (in time) is saved, and the algorithm immediately stops tracing any path that is known not to be the worst case. This process, called information pruning, is required to keep the program execution times reasonable.
One known algorithm for static timing verification is a depth-first search (DFS) of the circuit starting at each signal guaranteed on a clock edge, labelling each node with the currently best-locally-known worst-case timing information. After all nodes have been labelled, a second pass examines all timing constraints to tell the designer whether the circuit as a whole meets its timing constraints.
Consider the circuit of FIG. 3, in which a first stage of the circuit has two paths of different delay times, which join at a multiplexer, whose output is captured in a latch. The output of the multiplexer fans out in a second stage of two paths of different delay times, which are joined at a second multiplexer. The DFS algorithm represents each node of a circuit by a data structure as shown in FIG. 4. The node has a name, a "worst case arrival time," and a pointer to the node that drove this worst-case transition.
FIGS. 5a-5e depict a DFS analysis of the circuit of FIG. 3: FIG. 5a shows a time-sequence of stack states, and FIGS. 5b-5e show a time sequence of states of data structures.
In the DFS algorithm, the graph of the nodes of the circuit is walked in a depth-first order. The algorithm's walker maintains a current "arrival time," and a stack of nodes. (Since this is a static analyzer, note that the "arrival time" does not "tick" off time incrementally, it moves forward and back by the discrete amounts of delay of the logic walked.) The DFS walker pushes nodes onto the stack as it traces paths downstream, and pops them as it unwinds back upstream. The walker increments its arrival time as it walks downstream through logic by the time delay of the logic, and decrements it the same amount as it unwinds back. As the algorithm pushes each node, if the walker's arrival time is later than the current "worst case arrival time" (or simply ".time") of the node, then the node is updated with the value of the DFS arrival time, and the node's "worst case predecessor" (or simply ".predecessor") is pointed at the predecessor node down which the DFS walk came, and the DFS continues down the successor nodes. If the DFS arrival time is equal to or earlier than the current node's worst case arrival time, the probe of this path is abandoned, and the node is popped off the stack.
In FIG. 5a, each column depicts a step 300 identified by number, and the value of the DFS arrival time 302 during that step. The state of the DFS stack 304 is also shown, with the top of the stack in bold. The term "labelled" is used to describe information permanently (though overwritably) stored in the representation of the circuit. "Unvisited" is used in a local sense: a node is unvisited if it has not been visited via the current path, even if it has been previously visited via a different path.
step 1: The algorithm begins a probe at a latch, in this case latch L.sub.v, at a time that is assumed, without loss of generality, to begin at 1. PA0 step 2: FIG. 5b shows the configuration of the nodes for the circuit of FIG. 3 as the algorithm visits the first node of the circuit, node A 310. All the node names have been filled in. A.predecessor and A.time have been filled in (by the process about to be described in detail). PA0 step 3: Assume that A's list of successor nodes is ordered such that the algorithm visits C, then B. Thus, the algorithm walks to node C. Since the logic connecting A to C, CL.sub.2, consumes 11 ns, the DFS algorithm carries the arrival time 12 as it arrives at C. The algorithm, finding C not already labelled, labels C.time with 12 and points C.predecessor to A. PA0 step 4: The only successor of C is D, through logic consuming 1 ns, so the algorithm proceeds to D and sets D.time 13 and points D.predecessor to C. Assume that D's list of successor nodes is ordered such that the algorithm visits node E, then F. PA0 step 5: Node E is filled in with time 26 and predecessor D. PA0 step 6: Node G is filled in with time 29 and predecessor E. The walk would continues downstream from node G. PA0 step 7: Assume that clock .phi..sub.w will open latch L.sub.w at a time later than 33; thus this latch is still closed, and the DFS probe blocks. PA0 step 8: DFS pops its stack back to G. G has no unvisited successors. PA0 step 9: DFS pops its stack to back E. E has no unvisited successors. PA0 step 10: DFS pops its stack back to D. D has an unvisited successor, F. PA0 step 11: Node F is filled in with time 32 and predecessor D. PA0 step 12: When DFS arrives at node G with arrival time 33, it finds the node already labelled, but with a time earlier than the current DFS arrival time. Thus, G is updated with time 33, and G.predecessor is updated to point to node F. Note that pointing G.predecessor from E to F "prunes" from the graph all analysis downstream of E that was computed between steps 5 and 6. The algorithm has proved that E cannot possibly be on the critical path to G nor any node downstream of G. Because G has been relabelled, the nodes downstream of G must be walked again to have their times updated. PA0 step 13: Latch L.sub.w is still closed, and the DFS probe blocks. PA0 step 14: DFS pops its stack back to node G. PA0 step 15: DFS pops its stack back to node F. PA0 step 16: DFS pops its stack back to node D. D has no unvisited successors. PA0 step 17: DFS pops its stack back to node C. PA0 step 18: DFS pops its stack back to node A. The next unvisited successor of A is B. PA0 step 19: B is labelled with time 8 and predecessor A. PA0 step 20: DFS arrives at node D with arrival time 9. The arrival time is earlier than the current time of node D; thus, the algorithm stops probing along this path: all paths downstream of node D through node B are also said to be "pruned." By the same reasoning used in step 12, the algorithm has proved that the critical path to all nodes downstream of D must pass through C, not B. PA0 step 21: DFS pops its stack back to node B. PA0 step 22: DFS pops its stack back to node A. Node A now has no unvisited successors. PA0 step 23: DFS pops its stack back to L.sub.w.
The intermediate state after step 7 is shown in FIG. 5c. The "worst-case arrival times" 322 have been filled in with a preliminary estimate of the latest transition time. The .predecessor pointers 320 show a preliminary estimate of the critical path to G, L.sub.v -A-C-D-E-G.
The intermediate state after step 13 is shown in FIG. 5d.
Finding no unvisited successors of L.sub.w, the DFS algorithm is complete. The result of the algorithm is the critical path graph of FIG. 5e. For instance, the critical path to node G can be discovered by tracing the .predecessor pointers from a node; e.g., the critical path to G is seen to be L.sub.v -A-C-D-F-G. The critical path graph will be of the form of a forest of trees, each tree rooted at one of the input nodes or interior latches. Paths B-D and E-G have been pruned; no larger path that would have used these paths will be analyzed.
There may be multiple critical path graphs built for a single circuit, for instance one for a rising clock edge and one for a falling edge. Each node will have at most a single out-edge pointing to the latest-transitioning driver node for the given clock edge (or to one of several equallylate transitioning). The critical path graphs superimpose without effect on each other. Without loss of generality, what follows will discuss single critical path graphs.
Once the timing verifier has identified the critical path to every node, the designer will redesign parts of the circuit to speed up the logic on the critical path, and then run the timing verifier again. If the designer successfully speeds up a structure on the critical path, subsequent runs of the timing verifier on the altered circuit will likely produce a different critical path graph.
In dynamic timing verification, the designer creates test vectors that exercise active, meaningful paths of the circuit. Static timing verification, by its nature, ignores the designer's logical intent and tests all paths through the circuit. The blessing of this more complete coverage is also a curse: unused, meaningless paths that will never be exercised during actual use of the circuit are also tested.
Existing timing verifiers for VLSI design are used almost exclusively for semi-custom circuitry. One reason that they have been impractical for full-custom ICs is that they are unable to cope with the myriad of possible latch designs that a full-custom IC may present.
Semi-custom ICs are built around the concept of a cell library--a fairly small number of cells, constructed beforehand. These cells are then reused and replicated, just as a mason might create interesting patterns in a brick walkway using only a handful of different types of bricks. The advantage of this cell-library scheme is its short time to market. Since a semi-custom circuit design uses relatively few (on the order of one hundred) cells, and since the replication of a cell takes far less work than its original creation, the work involved in chip design is vastly reduced. However, since no small set of cells can exactly meet the needs of every individual situation, the performance of this type of design is limited.
A full-custom IC design methodology removes this restriction. While standard cells may be used in most of the IC, a designer is free to use hand-tailored circuitry in his most critical design sections.
Known timing verifiers subscribe to a notion that a circuit consists of combinatorial logic and latches: for instance, in the analysis of FIGS. 1 and 2, the depth-first walk would stop at latches 132 and 134, or in FIGS. 3 and 5, the depth-first walk would stop at the first latch downstream of node G. Thus, if the input to the timing verifier is a device-level wirelist rather than a gate- and latch-level block diagram, the timing verifier must identify the latches in the circuit model. When the design is constrained to the blocks of a semi-custom cell library, there may be only, e.g., five types of latches, and thus it is easy for a template-matcher to reassemble the wirelist-level devices into block-level latches when closely tuned to a particular circuit-design methodology, this template matching approach can work quite well. In a second method, the latch cells of the schematic-capture editor have a "latch" attribute bit. When the editor produces a wirelist, the wirelist includes a "dotted line" annotating the inputs, outputs, and elements of the latch.
In full-custom circuitry the clocking circuitry may be more complex than allowed in semi-custom designs. One reason is the existence of logic families (e.g., CVSL, cascode voltage switched logic, developed by IBM) where arbitrary combinatorial logic may be embedded in a latch. The number of combinatorial functions rises exponentially with the number of function inputs, so the number of possible latch types grows into the thousands or tens of thousands very quickly. Of course, in any given design, limited human resources prevent more than a few hundred of these from being used.
Although in semi-custom circuits, clock enable signals are typically required to be set up before their clocks activate, in full-custom designs such as domino logic families, a clock enable may legally transition in the middle of a clock phase. The "latch" derived by the analyzer would include everything from the clock conditioning gate to the tristate inverter output of the latch. The amount of circuitry between may be very large.