This invention relates to analysis of timing variation for integrated circuit design.
Static Timing Analysis (STA) is a method for determining timing performance of a digital circuit without requiring the use of simulation patterns (circuit simulation with patterns of input signals). Based on generally the same principals as PERT or critical path analysis, a typical application of STA considers all possible paths through a digital circuit, and predicts whether or not the overall circuit can perform at a given clock speed relative to the underlying manufacturing processes which will be used to produce the chip.
STA is performed with respect to a timing path. A timing path can be considered to have three elements: a data path, a launch clock and a capture clock. The data path is a specific path through combinational elements (also referred to below as a logic elements or logic cells) from the output of one register (e.g., the q or not q output) to the data input of another register. The path through each combinatorial element is considered with respect to an input signal (i.e., rising, falling), and a specific timing arc through that element. For instance, in a representative logic cell with two inputs, A and B, and one output, X, there is a timing arc from input A to output X and another arc from input B to output X. Furthermore separate arcs are specified for when A or B is rising or falling, and there may be conditional arcs depending on the state of another input. For example, a unique arc between A to X might be specified as “from A to X when A is falling and B is low.” When timing data are generated for such a cell, each arc can be characterized uniquely using a deterministic numerical simulation, for instance, using SPICE (the industry standard circuit simulator—transistor models tell the circuit simulator how each transistor behaves), for a particular choice of parameters (e.g., transistor model parameters) characterizing the cell.
The launch clock is the path that feeds the clock pin of the first register in the data path, and is measured with respect to the trigger edge or level of clock. This effectively “launches” the data from the first register down the data path.
The capture clock feeds the clock pin of the second register in the path, and is measure with respect to the capture cycle of the clock—such that data arrives at the register and is properly read in.
The launch and capture clocks are typically derived from a common clock signal, for example, from a common point in a branching clock distribution tree.
A timing path (i.e., comprising a data path and launch and capture clocks) is evaluated relative to a timing constraint. A timing constraint is a defined, for instance, in terms of an ability of a register (e.g., flop, latch, etc.) to properly capture the data value from the data path. Other examples of timing constraints relate to state transitions such as reset, clear, etc. but for simplification this discussion focuses on data related timing constraints.
A constraint is generally defined in terms of a timing window of the clock to capture data. Data must be stable at the input of the flop before the active edge of the clock (set up) and stay stable until the trailing edge (hold). If the data signal does not arrive in time (i.e., sufficiently prior to the clock edge), then it may not get latched in (setup). If the data signal changes during the active window, then it will be ‘metastable’ and the value cannot be trusted (hold). The amount of positive or negative time relative to the constraint is called timing slack. Positive slack is good. Negative slack, which reflects inadequate setup or inadequate hold, is bad, and is generally considered to be a design violation which should be avoided to produce high yields of functioning fabricated circuits.
When STA evaluates a timing path, it tries to find the worst case conditions in order to expose a timing violation. When STA is used to try to find a hold violation (i.e., data does not stay stable during the clock period), it tries to find the earliest arrival of the data signal at the capturing register and latest arrival of the clock edge at that register to capture the data. Conversely, when STA is used to try to find a set-up violation (i.e., data does not arrive in time before the capture clock), then it tries to find the latest the data arrives and the earliest the capture clock triggers the capture of the data.
As introduced above, a model of a cell has a set of parameters, which determine the timing of the cell with respect to each potential arc through that cell, and each parameter has a range that is expected for that parameter based on fabrication and/or environmental (e.g., temperature) variation. Timing of a chip is generally evaluated relative to manufacturing “corners,” which are combinations of parameter values with each value at either a maximum or minimum of the expected range for that parameter. As is discussed further below, some parameters are “global” and assumed to take on the same value for all cells in a circuit, and some parameters are “local” and assumed to represent local variation that is independent from transistor to transistor in the circuit, generally modeled as zero-mean random variables. In the discussion below, a “corner” may be a “total corner”, which reflects an extreme value for all parameters (local and global), or may be a “global corner”, which reflects an extreme value for all global parameters and typical (i.e., zero) values for all the local parameters. Without qualification, in the discussion below, a “corner” generally refers to “total corner.”
Generally, one corner is determined (e.g., via deterministic numerical simulation) to be the “fast” corner in that delay through logic cells are expected to be lowest with the corresponding parameter values, and one corner is determined to be the “slow” corner in which delays are expected to the greatest. When timing of a circuit is analyzed at different corners, different problems will generally be exposed. For example, hold violations typically occur at the fast corner, while set-up violations typically happen at the slow corner.
Process models are created by measuring semiconductor foundry manufacturing statistics at a given process node (e.g., 28 nanometer). These manufacturing data are in turn fitted to a SPICE transistor model such as BSIM 4. The transistor data is usually fit to corners as well as to statistical models, as discussed further below. These corners represent the fast and slow extremes of the manufacturing process. The naming convention for the fast and slow corners is SS, FF and TT where the two letters refer to the N transistor and P Transistor respectively. SS means slow n and slow p, FF means fast, and TT means typical.
In order to enable timing analysis of a design in each of the process corners, timing data is determined for each cell in a library of cells (e.g., NAND gate, NOR gate, FLIP-FLOP) at each process corner. This procedure is known as library characterization. The libraries are characterized at each corner (usually SS, TT and FF), as well as with respect to the operating voltage and temperature ranges for the circuit. One approach to this characterization is by evaluating every timing arc in the cell using a deterministic numerical simulation (e.g., using SPICE). The results from this analysis are stored as a library file, usually in the Liberty format. Each arc in each cell is evaluated in SPICE under different load and slew conditions.
The default Liberty format represents a discrete set of load and slew conditions, typically arranged in a 7×7 table of different load×slew conditions. The side inputs (the other pins of the cell) are usually set to the worst case condition, for example, determined by considering all possible side input values from which the worst case is selected. Slew describes the shape of the input ramp of the driving signal. For instance, a 500 picosecond slew describes the transition period of the input signal from low to high or from high to low. This usually means from threshold to threshold, but it can also reference from 0 volts to saturation. The load refers, for example, to the capacitive load on the output of the cell. This will range from a very light or near 0 load to a high load reflecting multiple receivers. So the values in the library can be considered to be a mapping:
(corner, cell, arc, [side inputs,] slew, load, rising/falling)→delay.
In one approach to STA, the library for one corner of timing data is referenced at a time. That is, timing data for different corners are not used together in one static timing analysis of the circuit. As noted before, the SS corner tends to expose set-up violations (slow data), whereas the TT and FF tend to expose hold violations (fast data).
Referring to FIG. 1, in a circuit example, evaluation of a data path from the Q output of a first register 114 to the D input of a second register 116 is evaluated relative to the launch clock path from a common clock signal point 132 to the clock input of the first register 114 and the capture clock path from the common signal point 132 to the clock input of the second register 114. Checking of a setup and hold violation considers the delays along the illustrated path 151, 152 as compared to the delay along the clock path 153 at each of the representative total corners (e.g., SS, FF, and TT). Specifically, the delay of the clock edge through buffer 144, and the delay through flip flop 114 (from clock input to D output), though NAND gate 124, and through XNOR gate 126 is compared to the delay through delay of the clock edge through buffers 146 and 147. The load on each output of a cell is known based on the interconnection of the cells. For a particular timing path and directions of the transitions (rising vs. falling), the slew rates are determined based on known cell characteristics. Then, for those load and slew rates, the particular delays in the tables for each arc in the paths are added to determine the overall delay through the path. The capturing register is specified according to its required setup and hold times, and these specified times are compared to the calculated delays to see if a hold or setup violation is present on that path.
One limitation of traditional corners-based analysis is an assumption that all gates of a circuit are at the SS or FF corner on any given chip. For a given parameter corner, each gate in a path is assume to be at that corner value, and the delays are additively accumulated (i.e., added) along the path. In practice, each gate is not at exactly the same parameter corner, for example, due to the local parameters (on die process variation within the circuit), and the additive accumulation of the delays is not accurate. Although part of the timing variation is related to chip-to-chip (or die-to-die variation), part of the variation is local to one chip (on-die variation). Stated differently, the total variance is a combination of die-to-die variance (global) and on-die variance (local).
Using traditional corners and accumulating the delays at that corner can both overstate and understate relative speed of the launch clock, capture clock, and data path relative to each other. It can exaggerate set-up violations, which causes the designer to have to speed up a path, and in turn consumes too much power (the pessimistic case). It can also understate the speed of the data path or spread between launch and capture clock and miss hold violations (the optimistic case). Note that both of these cases have the undesired effect of reducing production yield—both excessive power and hold violations can cause a chip to fail with respect to its manufacturing specifications.
Another technique, referred to as Advanced on Chip Variation (AOCV), makes use of a table of adjustments to the characterized models, which correct for both the pessimistic and optimistic case. These adjustments are referred to as “derates” can be expressed as multiplicative “corrections” to the timing value that is used. The idea behind AOCV is that as depth increases (number of gates or transistors in a row), local variation will “cancel out” and the overall delay will approach a value predicted by the die-to-die variance of the individual cells and effect of the local variance will cancel out such that is has relatively less influence on the variance of the overall delay.
AOCV supports two kind of indexes—depth and distance. Depth—how many logical cells are in a path—turns out to be a reasonable surrogate for modeling on chip device process variance. Distance on the chip is not generally used, since from a device process variation standpoint it is relatively insignificant. Note that the calculation of depth depends on the type of timing analysis (graph or block vs. path), and the tool (small differences may exist between STA tools). STA, in order to apply the AOCV derate, looks at the logic depth of the overall path, and uses that depth value to find the right AOCV derate table for each of the cells in the path. The derate is used to multiply the delay value for that cell.
An industry standard AOCV table provides eight derate values for a cell (not per arc, just per cell). There are four data values (when the cell is in a data path) and four clock values (when the same cell is in a clock path). These four values are (1) early/rise, (2) early/fall, (3) late/rise, (4) late/fall. For example, when the earliest a signal can arrive at a capture register, the early/rise or early/fall derate values are used, depending on the transition direction of the input signal to the cell in the path. The derates can be considered to be a mapping:
(cell, data/clock, depth, rise/fall, early/late)→derate.
Referring again to FIG. 1, and considering the timing path with launch clock path 151, data path 152, and capture clock path 153, application of AOCV derate values in the case of assessing a possible hold violation, an early clock derate is applied to buffer 144, and late clock derates are applied to each of buffers 146 and 147, while early data derates are applied to gates 124 and 126 and launching flip-flop 114. In some examples, the derates are not necessary applied to each of the launch clock path, the data path, and the capture clock path. An example of depths applied to the gates is buffer 144 having depth 1 and buffers 146 and 147 having depth 2, and gates 124 and 126 having depth 2. Note that different systems may have different definitions of depth without departing from the basic principle described herein. Note also that the depth of a gate depends on the path being considered. For example, gate 124 may have a depth of 2 in the path from register 114 to register 116, but have a depth of 3 on the path from register 112 to register 116.
Although use of standard AOCV tables provides improved accuracy over STA (or use of timing and optimization tools) performed without such derate values, the computation of derate values for cells using conventional approaches yields pessimistic worst case characteristics for the paths in an integrated circuit. There is therefore a need for more accurate prediction of path characteristics based on characteristics of the individual cells. Furthermore, even with improved techniques that reduce the computation required for determining the derate factors, there is a further need to efficiently form data that characterizes cells in a way that can be used to predict path characteristics will lower pessimism that used today.