Static Timing Analysis (STA) is a key step in the design of high speed Very Large Scale Integrated (VLSI) circuits. It is used to verify that a given VLSI circuit design performs correctly at a required frequency before it is released to manufacturing. STA is performed on a timing graph representation of the design; the points in the design where timing information is desired constitute the nodes or timing points of the graph, while electrical or logic connections between nodes are represented by timing arcs of the graph. STA typically consists of certain fundamental steps that include:                i) delay calculations, which involves modeling and calculating delays across the gates and interconnects (represented by timing arcs) included in the timing graph representation of the design;        ii) propagation and calculation of signal arrival times, required arrival times and slews across all timing points, and        iii) slack calculation across all timing points in the design.        
Referring to FIG. 1, there is shown an illustrative circuit 100 having a primary input port 102 and a primary output port 109. An interconnect 103 (also referred to as a net) connects the primary input to gate 105. The input pin capacitance of the gate 105 is labeled with numeral 104. Gate 105 feeds logic 106 that includes other gates and nets, and finally connects to gate 107. The net 108 connects gate 107 to the primary output that has an external capacitive load CL.
An interconnect network denotes a set of connected interconnect (or wire) segments that is driven by at least one source pin and feeds at least one sink pin. Typically, there is only a single source pin of the interconnect network, while multiple sink pins are common. A source pin is often connected to the output of a gate or a primary input, while a sink pin is likely connected to the input of a gate or a primary output. The electrical parasitics of the mentioned set of interconnect segments excluding the source and sink gate or primary input/output pin capacitances constitute the parasitics of the corresponding interconnect network. Referring to FIG. 1, the interconnect network connected to the primary input 102 is composed of the interconnect segment 103.
When performing an STA on the illustrative circuit 100, computing voltage waveforms (which may be represented by delays and slews) across all gates and nets in the design are necessitated. Given a voltage waveform 101 at the primary input 102, the voltage waveform at the sink of the interconnect 103 is a function of the voltage waveform 101, the interconnect parasitics 110 and the input pin capacitance 104 of gate 105. The voltage waveform at the sink of 103 is calculated by fitting a ramp or a piece-wise-linear waveform to the voltage waveform 101 at the primary input, and subsequently convolving it with the transfer function of the interconnect load. Various Model Order Reduction (MOR) techniques such as Asymptotic Waveform Evaluation (AWE) and Passive Reduced-order Interconnect Macromodeling Algorithm (PRIMA) have been proposed for accurate interconnect timing analysis. The stated techniques reduce the complexity of a large-scale interconnect network to a smaller portion thereof, while preserving, to the possible extent, their input-output behavior. Alternatively, a large-scale interconnect network is approximated or reduced to a smaller interconnect network so that when the same input signal is applied to the output response of the original and the approximated (reduced) network, they closely match each other.
Higher order reduced models offer increased accuracy at the cost of increasing the complexity of the analysis. By way of example, a first reduced order model can be analyzed very rapidly, but it may introduce significant errors in the input-output behavior of the system. Consequently, a trade-off between accuracy and speed for determining the order of the reduced model is performed. In the simplest case, the interconnect parasitics 110 and the input pin capacitance 104 of gate 105 may be approximated by a single capacitive load or lumped load, which may be obtained by summing the capacitances in 110 and the input pin capacitance 104. However, such a lumped load model can introduce significant errors in the delay and slew calculations while performing a static timing analysis.
With modern chip manufacturing technology scaling to sub-65 nanometers, VLSI designs are increasingly getting larger in terms of size and complexity. Performance centric designs, especially microprocessor designs, include custom circuit designed components called macros to achieve aggressive frequency targets. STA of these macros utilize circuit simulators to simulate the device delay and slew calculations.
It is known in the art that Application Specific Integrated Circuit (ASIC) designs include several million gates while typical microprocessor designs may include upwards to one billion transistors. Circuit simulation, while highly accurate for transistor level designs, is run-time intensive. Accordingly, it is not practical to use in a timing flow where timing runs are made daily during the design cycle of the chip. In essence, a static timing analysis of such large circuits configured as a single flattened design is run-time prohibitive. The has led to the development of a hierarchical timing flow where custom parts of the design are timed using accurate timing models (e.g., transistor level timing tools with circuit simulation type accuracy in the case of microprocessor designs), followed by the generation of timing abstract models that reflect in a simpler form, the actual timing characteristic of the custom logic. The latter step is termed timing abstraction, and may be employed for non custom logic designs as well. For ease of notation in the present invention, the term macro will be used to denote any circuit being abstracted, irrespective of the true level of hierarchy of that circuit. A macro may represent a transistor level design, consisting of Field Effect Transistor (FET) devices and interconnects, or a gate level design, consisting of gates and interconnects. Inputs and outputs of the macro are denoted as macro primary inputs and macro primary outputs, respectively. At the next (upper) level of hierarchy (termed chip level for ease of notation), macros are generally represented by abstracts. Multiple levels of hierarchy are common in modern VLSI designs. The term chip level will be generically used to denote a level of hierarchy where an abstract (generated at a lower level of hierarchy) is being included for STA.
Referring to FIG. 2, a hierarchical microprocessor 200 chip design is shown at the chip level of the hierarchy that includes multiple macros 201. Each macro internal to block 200 is not necessarily unique. Multiple instances of the same macro often exist at upper levels of the hierarchy. Global wires and gates (termed glue-logic, e.g., 202) connect to one or more macros within the chip level, as well as to the primary inputs and primary outputs at the chip level. Each unique macro is separately timed and abstracted and, subsequently, the abstract model is used during static timing analysis as a timing model of the corresponding macro at the chip level.
The hierarchical timing approach enables fast timing analysis and productivity at chip level, since abstract models are simpler (thereby facilitating fast delay and/or slew computation) and allow re-use. These benefits are significantly highlighted when multiple instances of a macro appear at the chip level since the flow avoids expensive separate static timing analysis for each instance of the macro by limiting accurate STA and abstraction only once per unique macro.
A generated abstract captures the timing characteristic of the macro using slew and load dependent tables to model the timing behavior of the logic. The abstract model is required to be context-independent, that is, independent of the voltage waveforms (slews) at its primary inputs and loads at its primary outputs. Consequently, delays and output slews (or waveforms) of timing arcs near the primary inputs (PI) of the macro are characterized as functions of input slew, while delays and output-slews of arcs closer to the macro primary outputs (PO) are characterized as functions of output load, and sometimes a combination of both. This allows the generated abstract models to be used in multiple boundary condition (PI and PO) settings. Timing abstraction employs techniques directed to reducing the size of the timing graph by performing pruning as well as arc compression. These techniques can reduce the number of timing arcs to be timed at the next level of hierarchy significantly. Model reductions of 75% are common. The abstract model essentially represents the macro as a complex gate, and may obfuscate the internal details of the circuit. The may be desirable for generating designs shared between vendors, and provides motivation for generation of abstracts as industry standard gate models.
Referring to FIG. 3, abstraction of the circuit (or macro) in FIG. 1 is illustrated. Block 300 shows the input side of the macro consisting of macro primary input port 301 and an interconnect segment 302 connecting 301 to gate 304. The parasitics of the interconnect segment 302 are referenced by numeral 305, while the input pin capacitance of the gate 304 is labeled as 303. The timing abstract model of the macro is shown as block 306. The macro primary input port 301 is preserved in the abstract model as 307. However, the internal segments and components of the macro (for instance 302 and 304) are abstracted and may even be merged. Block 306 is considered as a complex gate that no longer includes interconnects. The timing arcs in the abstract model are characterized (some as functions of input slew, or output load or both), and the timing model is stored preferably in a standard industry format (e.g., Lib© format). To capture the load seen from the primary input, the total capacitive load of the interconnect segment 302 and the input pin capacitance 303 of gate 304 are summed and set as the lumped pin capacitance 308 on the input port 307 of the abstract model.
Lumped pin capacitances are stored only for the primary input pins of the abstract. This because any glue logic feeding (or driving) the abstract model at the chip level requires the load seen from the primary input of the macro during timing analysis of the glue logic. Since all internal interconnect segments (that are not directly connected to a primary input) in the given macro are characterized and are not fed by any glue logic during hierarchical timing, it is not necessary for the abstract model to store the respective parasitic information.
The delay and output slew across interconnect segment 302 in FIG. 3 is characterized accurately as a function of different voltage waveforms (range of input slews) at the primary input during abstraction. The characterization takes into account the detailed parasitics 305 of the interconnect segment, and is thus accurate. Interconnect segments that are connected to the primary outputs of a macro and gate segments that feed these interconnect segments in the macro are both characterized accurately taking into account the detailed interconnect parasitics. Since no glue logic is expected to feed these segments during hierarchical timing, a lumped pin capacitance to capture the parasitics for these interconnect segments is typically not required.