For integrated circuits (e.g., VLSI chips) to work properly, the signals traveling along their gates and interconnects must be properly timed, and several factors are known to cause timing variations. As examples, variations in manufacturing process parameters (such as variations in interconnect diameter, gate quality, etc.) can cause timing parameters to deviate from their designed value. In low-power applications, lower supply voltages can cause increased susceptibility to noise and increased timing variations. Densely integrated elements and non-ideal on-chip power dissipation can cause “hot spots” on a chip, which can also cause excessive timing variations.
A classical approach to timing analysis is to analyze each signal path in a circuit and determine the worst case timing. However, this approach produces timing predictions that are often too pessimistic and grossly conservative. As a result, statistical timing analysis (STA)—which characterizes timing delays as statistical random variables—is often used to obtain more realistic timing predictions. By modeling each individual delay as a random variable, the accumulated delays over each path of the circuit will be represented by a statistical distribution. As a result, circuit designers can design and optimize chips in accordance with acceptable likelihoods rather than worst-case scenarios.
In STA, a circuit is modeled by a directed acyclic graph (DAG) known as a timing graph wherein each delay source—either a logic gate or an interconnect—is represented as a node. Each node connects to other nodes through input and output edges. Nodes and edges are referred to as delay elements. Each node has a node delay, that is, a delay incurred in the corresponding logic gates or interconnect segments. Similarly, each edge has an edge delay, a term of signal arrival time which represents the cumulative timing delays up to and including the node that feeds into the edge. Each edge delay has a path history: the set of node delays through which a signal travels before arriving at this edge. Each delay element is then modeled as a random variable, which is characterized by its probability density function (pdf) and cumulative distribution function (cdf). The purpose of STA is then to estimate the edge delay distribution at the output(s) of a circuit based on (known or assumed) internal node delay distributions.
The three primary approaches to STA are Monte Carlo simulation, path-based STA, and block-based STA. As its name implies, Monte Carlo simulation mechanically computes the statistical distribution of edge delays by analyzing all (or most) possible scenarios for the internal node delays. While this will generally yield an accurate timing distribution, it is computationally extremely time-consuming, and is therefore often impractical to use.
Path-based STA attempts to identify some subset of paths (i.e., series of nodes and edges) whose time constraints are statistically critical. Unfortunately, path-based STA has a computational complexity that grows exponentially with the circuit size, and thus it too is difficult to practically apply to many modern circuits.
Block-based STA, which has largely been developed owing to the shortcomings of Monte Carlo and path-based STA, uses progressive computation: statistical timing analysis is performed block by block in the forward direction in the circuit timing graph without looking back at the path history, by use of only an ADD operation and a MAX operation:
ADD: When an input edge delay X propagates through a node delay Y, the output edge delay will be Z=X+Y.
MAX: When two edge delays X and Y merge in a node, a new edge delay Z=MAX(X,Y) will be formulated before the node delay is added.
Note that the MAX operation can also be modeled as a MIN operation, since MIN(X,Y)=−MAX(−X,−Y). Thus, while a MIN operation can also be relevant in STA analysis, it is often simpler to use only one of the MAX and MIN operators. For sake of simplicity, throughout this document, the MAX operator will be used, with the understanding that the same results can be adapted to the MIN operator.
With the two operators ADD and MAX, the computational complexity of block based STA grows linearly (rather than exponentially) with respect to the circuit size, which generally results in manageable computations. The computations are further accelerated by assuming that all timing variables in a circuits follow the Gaussian (normal) distribution: since a linear combination of normally distributed variables is also normally distributed, the correlation relations between the delays along a circuit path are efficiently preserved.
To illustrate, in the ADD operation ADD(X,Y)=Z, if both input delay elements X and Y are Gaussian random variables, then the delay Z=X+Y will also be a Gaussian random variable whose mean and variance are:
                              Mean          ⁢                      :                    ⁢                                          ⁢                      μ            Z                          =                              μ            X                    +                      μ            Y                                              (        1        )                                          Variance          ⁢                      :                    ⁢                                          ⁢                      σ            Z            2                          =                              σ            X            2                    +                      σ            Y            2                    +                      2            ⁢                                                  ⁢                          cov              ⁡                              (                                  X                  ,                  Y                                )                                                                        (        2        )            where cov(X,Y)=E{(X−μX)(Y−μY)} is the covariance between X and Y.
In contrast, in the MAX operation Z=MAX(X,Y), MAX is a nonlinear operator: even if the input delays X and Y are Gaussian random variables, Z will not (usually) have a Gaussian distribution. However, as shown in C. Clark, “The greatest of a finite set of random variables,” Operations Research, pp. 145-162, March 1961, if X and Y are Gaussian and statistically independent, the first and second moments of the distribution of MAX(X,Y) are defined by:
                              Mean          ⁢                      :                    ⁢                                          ⁢                      μ            Z                          =                                            μ              X                        ·            Q                    +                                    μ              Y                        ⁡                          (                              1                -                Q                            )                                +                      θ            ⁢                                                  ⁢            P                                              (        3        )                                          Variance          ⁢                      :                    ⁢                                          ⁢                      σ            Z            2                          =                                            (                                                μ                  X                  2                                +                                  σ                  X                  2                                            )                        ⁢                                                  ⁢            Q                    +                                    (                                                μ                  Y                  2                                +                                  σ                  Y                  2                                            )                        ⁢                          (                              1                -                Q                            )                                +                                    (                                                μ                  X                                +                                  μ                  Y                                            )                        ⁢            θ            ⁢                                                  ⁢            P                    -                      μ            Z            2                                              (        4        )            where θ=σ(X−Y). P and Q are the pdf and cdf of the standard Gaussian distribution evaluated at λ=μ(X−Y)/σ(X−Y):
                              P          ⁡                      (            λ            )                          =                                            1                                                2                  ⁢                                                                          ⁢                  π                                                      ⁢                          exp              ⁡                              (                                  -                                                            λ                      2                                        2                                                  )                                      ⁢                                                  ⁢                          Q              ⁡                              (                λ                )                                              =                                    ∫                              -                ∞                            λ                        ⁢                                          P                ⁡                                  (                  x                  )                                            ⁢                              ⅆ                x                                                                        (        5        )            It is then possible to define a Gaussian approximation for the non-Gaussian Z=MAX(X,Y). In C. Visweswariah, K. Ravindran, and K. Kalafala, “First-order parameterized block-based statistical timing analysis,” TAU'04, Feburary 2004, the Z=MAX(X,Y) is approximated by a Gaussian random variable  which is a linear combination of X, Y, and an additional independent Gaussian random variable Δ:Z=MAX(X, Y)≈QX+(1−Q)Y+Δ=  (6)where Q is defined in the foregoing Equation (5), and is referred to as “tightness.” The purpose of the additional random variable Δ is to ensure that the first and second moments (the mean and the variance) of  match those of Z as specified in the foregoing Equations (3) and (4).
In the foregoing Clark reference, it was shown that if W is a Gaussian random variable, then the cross-covariance between W and Z=MAX(X,Y) can be found analytically as:cov(W,Z)=Qcov(W,X)+(1−Q)cov(W,Y)  (7)Substituting Equation (6):cov(W,)=Qcov(W,X)+(1−Q)cov(W,Y)=cov(W,Z)Hence, a convenient property of the approximator  is that the cross-covariance between Z and another timing variable W is preserved when the non-Gaussian Z=MAX(X,Y) is replaced by the Gaussian random variable . Thus, the use of the Gaussian random variable  as an approximation to the non-Gaussian Z=MAX(X,Y) allows preservation of linearity.
Unfortunately, one flaw of block-based STA is that its underlying assumption of a simple linear (additive) combination of sequential path delays is often incorrect. The delays of elements in a circuit can be correlated due to various phenomena, two common ones being known as global variations and path reconvergence. Global variations are effects that impact a number of elements simultaneously, such as inter- or intra-die spatial correlations, temperature or supply voltage fluctuations, etc. These generate global correlation between delay elements, wherein all globally correlated elements are simultaneously affected. An example of the effect of global variations is schematically depicted in FIG. 1(a), wherein node delays X, Y, and Z all depend on some influence g.
Path reconvergence occurs where elements share a common element or path along their past path histories owing to path intersections, and this leads to path correlation (local correlation of elements along some section of a path). An example of the effect of path correlation is schematically depicted in FIG. 1(b), wherein edge delays X and Y both depend on node delay p.
The underlying problem of global and path correlation is that while the output of the MAX operator can be directly approximated by a Gaussian distribution having its first two moments matching those of Equations (3) and (4), this approach fails to retain any correlation information after the MAX operation is performed. In short, the MAX operator destroys correlation information which may be critical to accurate timing prediction. Several approaches have been proposed for dealing with global and path correlation, but the field of timing analysis is lacking in methods for accounting for both of these correlations in an accurate and computationally efficient manner.
One approach to compensating for global variations is to use a canonical timing model (C. Visweswariah, K. Ravindran, and K. Kalafala, “First-order parameterized block-based statistical timing analysis,” TAU'04, Feburary 2004; A. Agarwal, D. Blaauw, and V. Zolotov, “Statistical timing analysis for intra-die process variations with spatial correlations,” Computer Aided Design, 2003 International Conference on. ICCAD-2003, pp. 900-907, November 2003; H. Chang and S. S. Sapatnekar, “Statistical timing analysis considering spatial correlations using a single pert-like traversal,” ICCAD'03, pp. 621-625, November 2003). In the canonical timing model, each of the node delays is represented as a summation of three terms:
                              n          i                =                              μ            i                    +                                    α              i                        ⁢                          R              i                                +                                    ∑                              j                =                1                                      ⁢                                          β                                  i                  ,                  j                                            ⁢                              G                j                                                                        (        8        )            where ni (i=1,2, . . . ) is the random variable corresponding to the ith node delay in the timing graph; μi is the expected value of ni; Ri; (called the node variation or local variation), is a zero-mean, unity variance Gaussian random variable representing the localized statistical uncertainties of ni; Gj represents the jth global variation, and is also modeled as a zero-mean, unity variance Gaussian random variable; {Ri} and {Gj} are additionally assumed to be mutually independent; and the weight parameters αi (named node sensitivity or local sensitivity) and βij (named global sensitivity) are deterministic constants, explicitly expressing the amount of dependence of ni on each of the corresponding independent random variables.
With this canonical representation, the variance of a node delay ni and its correlation (covariance) with another node delay nk can be evaluated as:
                                                        Variance              ⁢                              :                                                                                                        σ                                  n                  i                                2                            =                                                E                  ⁢                                      {                                                                  (                                                                              n                            i                                                    -                                                      μ                            i                                                                          )                                            2                                        }                                                  =                                                      α                    i                    2                                    +                                                            ∑                      j                                        ⁢                                          β                                              i                        ,                        j                                            2                                                                                                                              (        9        )                                                                                    Covariance                ⁢                                  :                                            ⁢                                                                                                                                      cov                ⁡                                  (                                                            n                      i                                        ,                                          n                      k                                                        )                                            =                                                E                  ⁢                                      {                                                                  (                                                                              n                            i                                                    -                                                      μ                            i                                                                          )                                            ⁢                                              (                                                                              n                            k                                                    -                                                      μ                            k                                                                          )                                                              }                                                  =                                                      ∑                    j                                    ⁢                                                            β                                              i                        ,                        j                                                              ⁢                                          β                                              k                        ,                        j                                                                                                                                                    (        10        )            However, if Equation 8 is also used to represent edge delays, this approach will implicitly assume that edge delays will only experience global variations, and that no path reconvergence occurs in the timing graph. This approach is acceptable where no path reconvergence is present, or where global variation dominates the correlations in the timing graph, but it will have severe problems where path correlation is important—which is unfortunately a common situation. To illustrate, in FIG. 1(b), both edge delays X and Y share a common path history including node p. However, in the canonical representation of edge delays X and Y, the local variation Rp of node p is not present: the path correlation between X and Y due to Rp is (incorrectly) dropped.
In the Visweswariah et al. reference, the aforementioned concept of tightness is used to retain global correlation information through the nonlinear MAX operation. A tightness-based linear combination is proposed to approximate the MAX operator while including an independent random variable Δ for the purpose of matching moments and covariance (Equation (6)). While the purpose of the inclusion of an independent Gaussian random variable Δ is to ensure the matching of the covariance of  to the output of the MAX operator Z, this parsimonious random variable may not accurately propagate correlation information, and thus may inadvertently introduce additional modeling error of the output pdf.
In A. Devgan and C. Kashyap, “Block-based static timing analysis with uncertainty,” ICCAD'03, pp. 607-614, November 2003, a common node detection procedure is introduced to deal with path correlation (path reconvergence), but here global correlation is neglected. This method assumes that if two edge delays X and Y ever pass a common node whose output edge delay is W, then X=X′+W and Y=Y′+W. Operation MAX(X,Y) is then done as W+MAX(X′+Y′). This approximation is imperfect since X and Y usually don't have a very strong dependence on W. A counter example is illustrated in FIG. 2, where both X and Y are theoretically dependent on W, but practically speaking, X will be independent of W if U>>W, and similarly Y will be independent of W if V>>W.
Given that the trend in circuit fabrication is to ever-increasing speed and ever-decreasing size, there is clearly a pressing need for accurate methods of statistical timing analysis which compensate for both global and path correlation, and which are computationally efficient so that rapid design and testing is feasible.