Static timing analysis (STA) has been widely used in the industry to determine the latest and earliest possible switching times of various signals within a digital circuit. STA can generally be performed at the transistor level, using circuit simulation packages such as SPICE, or at the gate level, using pre-characterized library elements, or at higher levels of abstraction, for complex hierarchical chips.
Conventional STA algorithms operate by first levelizing the logic structure, and breaking any loops in order to create a directed acyclic graph (timing graph). Modern designs often include millions of placeable objects, with corresponding timing graphs having millions, if not tens of millions of nodes. For each node, a corresponding arrival time, transition rate (slew), and required arrival time are computed for both rising and falling transitions as well as early and late mode analysis. The arrival time (AT) represents the latest or earliest time at which a signal can transition due to the entire upstream fan-in cone. The slew value is the transition rate associated with a corresponding AT. A required arrival time (RAT) represents the latest or earliest time at which a signal must transition due to timing constraints in the entire downstream fan-out cone.
Referring to FIG. 1, ATs are propagated forward in a levelized manner, starting from the design primary input asserted (i.e., user-specified) arrival times, and ending at either primary output ports or intermediate storage elements. For single fan-in cases,AT sink node=AT source node+delay from source to sink.
Whenever multiple signals merge, each fan-in contributes a potential arrival time computed as:AT sink(potential)=AT source+delay,making it possible for the maximum (late mode) or minimum (early mode) of all potential arrival times to be retained at the sink node. Typically an exact delay value for an edge in a timing graph is not known, but instead only a range of possible delay values can be determined between some minimum delay and maximum delay. In this case, maximum delays are used to compute late mode arrival times and minimum delays are used to compute early mode arrival times.
Still referring to FIG. 1, RATs are computed in a backward levelized manner starting from either asserted required arrival times at the design primary output pins, or from tests (e.g., setup or hold constraints) at internal storage devices. For single fan-out cases,RAT source node=RAT sink node−delay.
When multiple fan-outs merge (or when a test is present), each fan-out (or test) contributes a prospective RAT, enabling the minimum (late mode) or maximum (early mode) required arrival time to be retained at the source node. When only a range of possible delay values are to be determined, maximum delays are used to compute late mode required arrival times and minimum delays are used to compute early mode required arrival times.
The difference between the arrival time and required arrival time at a node is referred to as slack. Early mode and late mode slacks are distinguished from each other and computed seperately. The equations are:Slackearly=ATearly−RATearly  (1)Slacklate=RATlate−ATlate  (2)A positive slack implies that the current arrival time at a given node meets all downstream timing constraints, and a negative slack implies that the arrival time fails at least one such downstream timing constraint. A timing point may include multiple such AT, RAT, and slew values for the purpose of distinguishing information for a specific subset of an entire fan-in cone or fan-out cone.
FIG. 2 illustrates an example of AT propagation for a typical NAND logic gate (200), where for simplicity, a delay of zero is assumed for all transitions. Each of the input signal edges on both inputs A and B (202, 204) create a transition at the output Z (206) of the NAND gate. Due to the bounding approach, earlier arriving signal edges (210, 220) create an early arriving output waveform on Z (250, 260) and a late arriving signal edges (230, 240) creates a late arriving waveform on Z (270, 280). In other words, for early output timing the minimum of the early ATs is propagated. For the late output timing the maximum of late ATs is propagated. This is the standard approach that STA uses for conservatively guard banding timing. Consequently, at output Z the late and early arriving waveforms are clearly distinguishable. The difference of the two waveforms leads to a conservative slack calculation downstream in the timing graph from the NAND gate.
If it is known that the NAND2's input signals on A and B are at logic zero for a portion of each cycle (i.e., each either remains at zero switches to one and back to zero in each cycle), as depicted in the example of FIGS. 2 and 3, the conservatism can be reduced through applying a technique hereinafter referred to as reverse merge timing. FIG. 3 illustrates such instance. For the inputs' rising edges on the NAND2 gate (300), based on its logic function, it is known that when all inputs start at logic zero, only the last input signal to rise (330) can create a falling transition on output (350, 370). Similarly, if all input signals are known to fall, only the first input signal to fall (320) triggers a rising edge at output Z (360, 380). For the early rising input edges this means that the maximum of the early rise ATs is propagated. Similarly, for the late falling input edge the minimum of late falling ATs is propagated to the output rising edge on Z. This mechanism will be referred henceforth as “reverse merge” since it is the opposite of what normal static timing analysis propagation does, as previously described with reference of FIG. 2, i.e., it reverses the roles of the minimum and maximum in early and late timing. Given that early and late arrival times for the inputs signals are the same, as assumed for simplicity, in FIG. 3 it can be seen that early and late output waveforms are identical due to the reverse merge. These result in a reduction of pessimism for downstream timing slacks and, consequently, faster time to market for VLSI designs modeled with that feature.
Referring to FIG. 3, two crossed out input edges at inputs A and B are illustrated (310, 340) which do not contribute in any way to the switching of output Z. Thus, they are referred to “non-controlling” input edges. The input signal edges causing the output to transition are referred to as “controlling” input edges. When all inputs of an AND function (e.g., an AND or NAND gate or an input group of an AND-OR-INVERT gate) are known to reach or remain at a logic zero state in each cycle, the first falling transition among these inputs is controlling, and thus the first falling input transition may be propagated in both early mode (as for normal STA) and late mode (in which propagating it amounts to a reverse merge operation). For the same set of AND function inputs, the last rising input among them is controlling and can be propagated in both late mode (as for normal STA) and early mode (in which propagating it amounts to a reverse merge operation). All other input transitions of the AND function can be considered non-controlling.
A similar analysis can be performed for reverse merge situations on OR gates or more complicated structures such as dynamic circuits encountered in transistor-level custom designs. When all inputs of an OR function (e.g., an OR or NOR gate or an input group of an OR-AND-INVERT gate) are known to reach or remain at a logic one state in each cycle, the first rising transition among these inputs is controlling, and thus the first rising input transition may be propagated in both early mode (as for normal STA) and late mode (in which propagating it amounts to a reverse merge operation). For the same set of OR function inputs, the last falling input among them is controlling and can be propagated in both late mode (as for normal STA) and early mode (in which propagating it amounts to a reverse merge operation). All other input transitions of the OR function can be considered non-controlling. Other circuits exist, including domino circuits that precharge circuit nodes to known values in each cycle, in which signals are known to reach of remain in a particular logic state in each cycle, and to which reverse merge timing may therefore be applied. In domino circuits certain nodes are precharged in each cycle.
Another area that heavily relies on timing analysis is timing optimization. Generally, the goal of optimizing is to improve the slack, power, area or other design metrics for all the circuits in the design such that the optimized parameters reach a designer predetermined target. During optimization, different parameters can be categorized as primary or secondary. Generally, most of the work of the optimization engine is oriented towards improving the primary optimization parameters, followed by any improvements to the secondary optimization parameters that do not cause degradation in the primary parameter optimization results obtained. Currently, state of the art timing optimization engines used in VLSI designs focus on slack improvement as a primary optimization parameter. To avoid design quality degradation, the remaining metrics, such as power and area, can only be optimized if the primary parameter can be measured.
Optimization is generally accomplished with a series of manipulations to restructure the design, reduce the capacitive load on gate outputs, improves signal propagation through the use of larger devices and other similar methods. Any design change made by optimization that does not result in an improvement of the targeted metrics is typically discarded and a different change is attempted following some predetermined heuristics.
In late mode static timing analyses, the designer attempts to ensure that the latest possible arriving signal at the storage element is correctly captured. Therefore, late mode timing optimization techniques aim at speeding up slow timing paths to obtain a desired clock frequency. In an early mode analysis, the designer attempts to guarantee that the traveling signal remains stable long enough to be captured by a timing element, thereby ensuring that the design is operational. For early mode analyses, the goal of the optimization is to slow down paths that are too fast, which could invalidate the signal before it is properly stored.
From the above, it is clear that current state of the art optimization techniques are precluded from processing any timing paths that do not have a defined value for a primary optimization parameter, e.g., slack. This can pose a difficulty in timing closure of designs that use the reverse merge.
Although reverse merge is a useful technique, it creates a dilemma when computing the slack for the non-controlling edges. If the slack to be calculated is the conventional case given by previously described equations (1) and (2), a non-controlling input will appear to be more critical (i.e., smaller signed algebraic slack value) than the actual controlling input. For example, considering the timing diagram in FIG. 2, given that all delays through the exemplary NAND gate are zero, an identical late mode RAT value will be propagated backwards from the downstream common output Z to both inputs A and B. However, since the first late falling transition (320) controls the output rising transition (380), the non-controlling last late falling transition (340) computes a worse slack value [RAT(A)=RAT(B) and late AT(B) non controlling>late AT (A) controlling, therefore SLACK(B)<SLACK(A)]. Such a situation can produce misleading guidance to a designer or optimization program that relies on slack values, since the more critical slack on the non-controlling late input B can provide an incentive to speed up the non-controlling signal, which in turn may end having no effect on the output timing.
To avoid this erroneous situation, prior art methods do not propagate any RAT to the non-controlling input edge of a reverse merge, and consequently no slack can be calculated in such a case. This may introduce at least two undesirable side effects. Firstly, in the absence of any slack, an optimization tool or process cannot attempt to improve non-controlling path which is particularly detrimental if the non-controlling path can easily be improved (FIG. 4). Note that failing to consider the non-controlling path during optimization results in an opportunity loss. Secondly, in cases where arrival times shift against each other in such a way that a new controlling arrival time is selected, a sudden slack discontinuity may occur. Referring back to the example on FIG. 3, let it be assumed that the late falling input signal transitions are involved in a reverse merge. Input A presently propagates a slightly smaller late mode arrival time, and thus represents a controlling arrival time for the output node's rising edge. Now, if either A or B arrival times change slightly, such that signal B arrives earlier than A, then B represents the new controlling arrival time in the reverse merge situation. Consequently, input A will change from having a valid slack to an invalid slack value, and input B will change from an invalid slack to suddenly propagating a valid slack. Such discontinuities (i.e., large changes in measured slack value for small change in arrival time) can make it very difficult for automated optimization algorithms to efficiently close timing on designs containing such reverse merge situations. For the same reason, it is difficult for designers analyzing timing reports to completely understand the relationship between these input signals.
In view of the foregoing, there is a need for a system and method for determining static timing analysis margin on non-controlling inputs of clock shaping and other digital circuits using reverse merge timing.