The goal of static timing analysis (STA) is to determine the latest and earliest possible switching times of various signals within a digital circuit. STA can generally be performed at the transistor level or at the gate level, using pre-characterized library elements, or at higher levels of abstraction, for complex hierarchical chips.
STA algorithms operate by first levelizing the logic structure, and breaking any loops in order to create a directed acyclic graph (timing graph). Nodes of the timing graph are referred to as timing points, and correspond to electrical nodes in the digital circuit (typically “pins” or ports of the circuit elements) at which signal transitions (e.g., from low to high or from high to low) can occur. Edges of the timing graph can include directed propagate segments which correspond to paths through circuit elements through which signal transitions can propagate, and typically go from input to output of the elements or gates of the digital circuit, and from source to sink of the nets of the digital circuit. A timing graph can also contain edges known as test segments which represent constraints on the relationship between timing values of the nodes at the ends of the test segment. Modern designs can often contain millions of placeable objects, with corresponding timing graphs having millions, if not tens of millions of nodes. For each node, a corresponding arrival time, transition rate (slew), and required arrival time are computed for both rising and falling transitions as well early and late mode analysis. An arrival time (AT) represents a bound on latest or earliest time at which a signal can transition due to the entire upstream fan-in cone. The slew value is the transition rate associated with a corresponding AT, and a required arrival time (RAT) represents a bound on the latest or earliest time at which a signal must transition due to timing constraints in the entire downstream fan-out cone.
ATs are propagated forward in a levelized manner, starting from the chip primary input asserted (i.e., user-specified) arrival times, and ending at either primary output ports or intermediate storage elements. For single fan-in cases,AT sink node=AT source node+delay from source to sink.
Whenever multiple signals merge, each fan-in contributes a potential arrival time computed asAT sink(potential)=AT source+delay,making it possible for the maximum (late mode) or minimum (early mode) of all potential arrival times to be retained at the sink node. Typically an exact delay value for an edge in a timing graph is not known, but instead only a range of possible delay values can be determined between some minimum delay and maximum delay. In this case, maximum delays are used to compute late mode arrival times and minimum delays are used to compute early mode arrival times.
RATs are computed in a backward levelized manner starting from either asserted required arrival times at the chip primary output pins, or from tests (e.g., setup or hold constraints) at internal storage devices. A test segment imposes a constraint between the ATs at its endpoints, for example, a setup test segment typically imposes a requirement that the AT of a data signal at one end precede the AT of a clock signal at the other end by some interval known as the setup or guard time, and a hold test imposes a similar requirement that clock precede data. For single fan-out cases,RAT source node=RAT sink node−delay.
When multiple fan-outs merge (or when a test is present), each fan-out (or test) contributes a prospective RAT, enabling the minimum (late mode) or maximum (early mode) required arrival time to be retained at the source node. When only a range of possible delay values can be determined, maximum delay are used to compute late mode required arrival times and minimum delays are used to compute early mode required arrival times.
The difference between the arrival time and required arrival time at a node (i.e., RAT−AT in late mode, and AT−RAT in early mode) is referred to as slack. A positive slack implies that the current arrival time at a given node meets all downstream timing constraints, and a negative slack implies that the arrival time fails at least one such downstream timing constraint. A timing point can include multiple parameters such as AT, RAT, and slew values, each denoted with a separate tag, in order to represent data associated with different clock domains (i.e., launched by different clock signals), to distinguish values for rising and falling transitions, or for the purpose of distinguishing information for a specific subset of an entire fan-in cone or fan-out cone.
Under certain circumstances, the approach described above for computing slack can result in an overly pessimistic estimate of timing performance. One such overly pessimistic scenario occurs in the situation where early and late propagation delays are different (e.g. to account process variability), and both early and late mode signals involved in timing test share a common part of their (typically clock) propagation paths. In such a scenario, while an exact value of propagation delay for the common propagation elements is unknown, it is typically impossible for such common delay elements to be operating at both early and late delay extremes simultaneously, and hence slack computed using extremes of late data and early clock (or vice versa) arrival times at a test results in an overly pessimistic bound on circuit performance. This pessimism can be reduced or even fully removed by the prior art technique of common path pessimism removal (CPPR).
In addition to correlations due to physically common portions of late and early mode paths, there can be correlations due to delay dependencies on common sources of variation (e.g., manufacturing or environmental). Therefore, when early and late paths leading to a given test represent different points in a process distribution (e.g., late mode delays are computed based on slow conditions, and early mode delays are computed assuming a different set of fast process conditions), an undo amount of pessimism is introduced when comparing resulting early and late arrival times at a test. The prior art method provides for additional pessimism relief to by accounting for early and late delay dependencies on common sources of variation by analyzing pairs of launch and capture paths.
The aforementioned prior art techniques are further extended to include credit for statistically independent delay contributions along a path (i.e., by computing a statistical root-sum-square [RSS] credit value for random delay along a path and/or by computing additional RSS credit for delay impact due to global sources of variation which are statistically independent of each other).
However, all the aforementioned prior art techniques for performing CPPR analysis require computation and propagation through the timing graph of separate timing values for each point at which common paths diverge, or for each test for which pessimism removal is performed. This can lead to large memory requirements in the computation process, and a common way to mitigate this memory requirement is to compute such values for one divergence point or for one test at a time and storing them in a common reused location in the timing graph being analyzed. Because of the reuse of storage locations in the main timing graph, these approaches are not amenable to parallel execution. Even when a common storage location is not used, the modification of the main timing graph in order to insert extra tags or add other attributes to the main timing graph in order to keep track of specific launch and capture path pairs and their associated commonality also creates a point at which parallel threads of computation would typically need to be locked.
In the domain of traditional block-based STA (as opposed to path-based CPPR analysis), several prior art efforts have been applied to speed up the static timing analysis.
One method exploits the fact that static timing analysis is required to be performed for different values of process and environmental parameters. It is proposed to make the runs in parallel on different computers or different processors and then merge the results into a joint report. This method has three major drawbacks. First, it can be applied only if it is required to run timing for different set of process and environmental parameters. This is not always the case, especially considering the emergence of statistical timing techniques capable of predicting circuit timing in the entire space of variational parameters. Second, each timing run consumes large amount of computer memory. Therefore, the total memory consumption is several times higher than in the conventional sequential timing run. Third, the combination of several timing reports together is a difficult and computationally expensive task.
In the prior art, a run is performed using different timing modes, particularly early and late mode timing analysis. However, it reduces the run time by at most a factor of two, and requires twice as much memory for constructing two virtual timing graphs for these modes. In another prior art method, it is proposed creating a virtual timing graph for each process corner or analysis mode. Yet another prior art method proposes partitioning timing graph into several clock domains, processing each domain by a separate computation thread, which, likewise, has serious drawbacks associated thereof.
In view of the failures encountered in the prior art, there is a need for a system and a method for efficiently performing static timing analysis which is amenable to parallelizing time consuming steps such as CPPR, traversing paths for identifying a set of critical paths, generating timing reports, and similar procedures that analyze and modify multiple non disjoint sub-graphs of a timing graph.