In many high-performance very large scale integration (VLSI) chips, including, for example, microprocessor chips, a reference clock signal, which may be generated externally and supplied to the chip, is distributed globally throughout the chip using a wiring network. The wiring network, which may be a tree-based network, a grid-based network, or a combination of a tree-based and a grid-based network, is typically re-powered at a number of points in the network by buffers. Each buffer ideally generates a signal that is identical to the original reference clock signal. It is essential that the signals generated by the various buffers arrive at their respective destinations throughout the chip substantially simultaneously so as to minimize clock skew, or arrive with precisely known timing differences so as to facilitate appropriate compensation of clock delays.
Balancing clock insertion delay across an entire chip to each destination clock load has been, and continues to be, a challenging and time-consuming problem involving many design netlist iterations and the commitment of significant computing and engineering resources. With the move to ever smaller geometries, the resulting increase in chip gate counts, and the integration of more and more intellectual property (IP) blocks and functions on a given chip, this problem has only grown in scope and complexity.
Most previous solutions for balancing clock insertion delay involve an elaborate process of placing and routing a current design netlist, extracting parasitics (e.g., capacitance, resistance, inductance, etc.) from a layout database for the netlist using an extraction tool, reading the design netlist into a static timing analysis tool, annotating the extracted parasitics in the static timing analysis, evaluating the timing information from the clock source to each destination clock load using the static timing analysis tool, adjusting the clock distribution network by modifying one or more circuit characteristics including wire routing in the chip, wire thickness, driver types, driver positions and/or the number of drivers, streaming out a new netlist, and iterating on the process until a desired clock insertion delay target is eventually obtained. However, this known approach is undesirable in that it is considerably time intensive, requiring many iterations and significant engineering resources, particularly for large designs where a single iteration can take several days or weeks to complete. Moreover, the resulting clock tree distribution is typically static and cannot be changed once the design is realized in silicon. Consequently, there is a risk that oversights in the initial design, as well as variations in process, voltage and/or temperature (PVT) characteristics of the chip, can cause the clock insertion delay to vary substantially from the desired target value, thereby requiring the purchase of additional mask sets and the performance of another chip fabrication cycle to correct the variation.
Accordingly, there exists a need for techniques for balancing clock insertion delay in a clock distribution network that do not suffer from one or more of the problems exhibited by conventional clock distribution architectures and methodologies.