1. Field of the Invention
The present invention generally relates to the fabrication and design of semiconductor chips and integrated circuits, and more particularly to a method of designing the physical layout (placement) of latches and other logic cells which receive clock signals from a clock distribution structure such as a local clock buffer.
2. Description of the Related Art
Integrated circuits are used for a wide variety of electronic applications, from simple devices such as wristwatches to the most complex computer systems. A microelectronic integrated circuit (IC) chip can generally be thought of as a collection of logic cells with electrical interconnections between the cells, formed on a semiconductor substrate (e.g., silicon). An IC may include a very large number of cells and require complicated connections between the cells. A cell is a group of one or more circuit elements such as transistors, capacitors, resistors, inductors, and other basic circuit elements grouped to perform a logic function. Cell types include, for example, core cells, scan cells and input/output (I/O) cells. Each of the cells of an IC may have one or more pins, each of which in turn may be connected to one or more other pins of the IC by wires. The wires connecting the pins of the IC are also formed on the surface of the chip. For more complex designs, there are typically at least four distinct layers of conducting media available for routing, such as a polysilicon layer and three metal layers (metal-1, metal-2, and metal-3). The polysilicon layer, metal-1, metal-2, and metal-3 are all used for vertical and/or horizontal routing.
An IC chip is fabricated by first conceiving the logical circuit description, and then converting that logical description into a physical description, or geometric layout. This process is usually carried out using a “netlist,” which is a record of all of the nets, or interconnections, between the cell pins. A layout typically consists of a set of planar geometric shapes in several layers. The layout is then checked to ensure that it meets all of the design requirements, particularly timing requirements. The result is a set of design files known as an intermediate form that describes the layout. The design files are then converted into pattern generator files that are used to produce patterns called masks by an optical or electron beam pattern generator. During fabrication, these masks are used to pattern a silicon wafer using a sequence of photolithographic steps. The process of converting the specifications of an electrical circuit into a layout is called the physical design.
Cell placement in semiconductor fabrication involves a determination of where particular cells should optimally (or near-optimally) be located on the surface of a integrated circuit device. Due to the large number of components and the details required by the fabrication process for very large scale integrated (VLSI) devices, physical design is not practical without the aid of computers. As a result, most phases of physical design extensively use computer-aided design (CAD) tools, and many phases have already been partially or fully automated. Automation of the physical design process has increased the level of integration, reduced turn around time and enhanced chip performance. Several different programming languages have been created for electronic design automation (EDA) including Verilog, VHDL and TDML. A typical EDA system receives one or more high level behavioral descriptions of an IC device, and translates this high level design language description into netlists of various levels of abstraction.
Placement algorithms are typically based on either a simulated annealing, top-down cut-based partitioning, or analytical paradigm (or some combination thereof). Recent years have seen the emergence of several new academic placement tools, especially in the top-down partitioning and analytical domains. The advent of multilevel partitioning as a fast and extremely effective algorithm for min-cut partitioning has helped spawn a new generation of top-down cut-based placers. A placer in this class partitions the cells into either two (bisection) or four (quadrisection) regions of the chip, then recursively partitions each region until a global (coarse) placement is achieved. Analytical placers may allow cells to temporarily overlap in a design. Legalization is achieved by removing overlaps via either partitioning or by introducing additional forces and/or constraints to generate a new optimization problem. The classic analytical placers, PROUD and GORDIAN, both iteratively use bipartitioning techniques to remove overlaps. Eisenmann's force-based placer uses additional forces besides the well-known wire length dependent forces to reduce cell overlaps and to consider the placement area. Analytical placers optimally solve a relaxed placement formulation, such as minimizing total quadratic wire length. Quadratic placers generally use various numerical optimization techniques to solve a linear system. Two popular techniques are known as conjugate gradient (CG) and successive over-relaxation (SOR). The PROUD placer uses the SOR technique, while the GORDIAN placer employs the CG algorithm.
While these techniques provide adequate placement of cells with regard to their data interconnections, there is an additional challenge for the designer in constructing a clock network for the cells and this challenge is becoming more difficult with the latest technologies like low-power, 65-nanometer integrated circuits. Low power circuits (e.g., around 20 watts or less for microprocessor chips) are becoming more prevalent due to power consumption problems. In particular, power dissipation has become a limiting factor for the yield of high-performance circuit designs (operating at frequencies around 1 gigahertz or more) with deep submicron technology. Clock nets can contribute up to 50% of the total active power in multi-GHz designs. Low power designs are also preferable since they exhibit less power supply noise and provide better tolerance with regard to manufacturing variations.
There are several techniques for minimizing power while still achieving timing objectives for high performance, low power systems. One method involves the use of local clock buffers (LCBs) to distribute the clock signals. A typical clock control system has a clock generation circuit (e.g., a phase-lock loop) that generates a master clock signal which is fed to a clock distribution network that renders synchronized global clock signals at the LCBs. Each LCB adjusts the global clock duty cycle and edges to meet the requirements of respective circuit elements, e.g., local logic circuits or latches (the term “latch” as used herein stands for any clocked element which is usually a sink of a clock distribution network). Since this clock network is one of the largest power consumers among all of the interconnects, it is further beneficial to control the capacitive load of the LCBs, each of which is driving a set of many clock sinks. One approach for reducing the capacitive load is latch clustering, i.e., clusters of latches placed near the respective LCB of their clock domain. Latch clustering combined with LCBs can significantly reduce the total clock wire capacitance which in turn reduces overall clock power consumption. Since most of the latches are placed close to an LCB, clock skew is also reduced which helps improve the timing of the circuit.
Conventional placement with LCBs and latch clustering is illustrated in the flow chart of FIG. 1. The process begins with an initial placement based on an input layout for the circuit (1). The input layout can be provided by an EDA tool, or can simply be a random layout for the circuit elements. The initial placement locates all circuit elements, including clock sinks, in a region of the integrated circuit using for example quadratic placement. Other placement techniques may be used but quadratic placement often produces better results than alternatives such as min-cut based placement. The quadratic placement portion of the process solves the linear system Ax=b where A is an optimization matrix, and x and b are vectors. During quadratic placement, cells are recursively partitioned into smaller bins until a target number of objects per bin is reached, such as five objects per bin. For the initial placement, all wires (edges) have the same net-weighting. The timing of the circuit is then analyzed and adjusted in early optimization (2). This optimization may include gate re-sizing and buffer insertion using a grid system such as a 50×50 grid in which buffers are assigned to grid cells having lower logic densities. A weighted placement (3) follows which is similar to step 1, but in the weighted placement the input layout is the output of the early optimization step 2 and different weights are applied to different edges based on the timing constraints. The partitioning may also be finer for the weighted placement, e.g., recursively partitioning until there are around two objects per bin. The weighted placement is then followed by late optimization which provides different logic optimizations such as buffer insertion but at a finer (or sometimes the same) level, e.g., in a 100×100 grid (4). Late optimization may be the same as early optimization, the conceptual difference being that early optimization works on a circuit which is never processed by layout-driven optimization steps.
Steps 1 and 3 of FIG. 1 do not differentiate between latches and other (non-clocked) logic cells, so at first the latches are allowed to move freely according to placement tools driven by data path timing. In the following steps the process focuses on the latches only, i.e., latches that are part of one or more clock domains. Latches are grouped into a given cluster based on locality and clock domain (5). The LCB for a given clock domain is located at the centroid of the latch clusters, and the latches are pulled to the LCB (6). For this latch-LCB driven placement, the size of the LCBs is temporarily shrunk to the same width as a latch. A relatively high weighting (attraction) is applied to the interconnections between the latch and the LCB for this placement step, e.g., by a factor of 10 compared to the net weights of the most critical data paths. In this manner all latches will be placed next to the corresponding LCB, which is then readjusted back to its original size. The final step is detailed placement which refines the layout using for example min-cut placement or heuristic techniques (7).
The resulting LCB-latch structure is very large relative to other circuit elements involved in the placement process and greatly impacts the timing of the S circuit. The LCB itself occupies a particularly large area and the latches are constrained to be very close to the LCB. While this process has some advantages relative to the clock network, such restrictions seriously affect the flexibility of a placer and can often produce poor logic placement. It would, therefore, be desirable to devise an improved placement method which could reduce the disturbance to the placement process that is introduced by clustering latches around an LCB. It would be further advantageous if the method could balance logic placement and latch placement constraints to achieve higher quality timing.