1. Field of the Invention
This disclosure relates to a clock router for semiconductor chips and, more particularly, to a balanced clock router with variable width wires to minimize skew.
2. Description of the Related Art
As the performance of microprocessors increases, it is essential to improve the design of clocks and their distribution on a chip to minimize skew. Clock skew can be defined simply as the difference in arrival time of clock signals at different points on a chip. Clock wiring is becoming increasingly critical in high performance chip design. As technology advances, chips are becoming larger as line width are becoming smaller and circuit delays are dropping. This results in the clock signal delay becoming a larger fraction of machine cycle time. Skew is also becoming a larger fraction of machine cycle time.
A possible clock network can be built as an "H-tree". The central assumption underlying the H-tree is that clock points all present the same load and that they are uniformly distributed throughout the chip. While these assumptions are never met completely in real chips, the resulting tree demonstrates some of the properties of more realistic clock networks.
Referring to FIG. 1, the development of an H-tree is shown. Clock points (not shown) of uniform capacitance are assumed to be uniformly distributed throughout the area of a chip 10. On the basis of this assumed symmetry, it is appropriate to provide a drive point 14 for a clock system 16 at the center of the chip as indicated by a dot. This will balance delays along wires 20 to the right and left, since spanned distances are equal. Clock system 16 will still present a substantial amount of skew, however. Actual clock points (not shown) of chip 10 are wired by conventional means to the nearest end point of clock system 16. End points 12 will in some cases be very close to clock points and some a significant distance away. When wired by conventional means (which minimizes the total amount of wiring used), clock points near end points 12 will see less delay than those far away. In the extreme case of a single end point 12 providing a clock signal to both the nearest clock point and the furthest clock point in a given quadrant, the difference in distance between extreme clock points is one quarter the perimeter of the chip.
The situation may be improved by subdividing each quadrant of FIG. 1 into quarter-quadrants, and replicating, at appropriately reduced size, the H-tree wiring pattern in each quarter-quadrant as illustrated in FIG. 2. The maximum range in distance from actual clock point (not shown) to end points 12 would be half of the value of the distance in FIG. 1. A "two-stage" H-tree may still result in excessive skew. The process of quartering the smallest region served by each end point 12 of a given H tree and providing an additional "H" within this smaller region may be continued to produce a "multi-stage" H-tree as illustrated in FIG. 3
FIG. 3 is a 4-stage H-tree that divides the chip into 256 small regions, each with an edge that is 1/16 the originally chip edge. Because wiring to actual clock points 12 in chip 10 is completed only from the end of the last stage of H's, the variability in wire length has been reduced by a factor of eight. Hence the variability in delay from the central feed point of the entire chip to any final clock point has been similarly reduced.
Note that the total length of wire in the H-tree increases by (w/2)+h each time a stage is added. If enough stages are added, the capacitance implied in the wires may become significant compared to the clock loads being driven. In this case, adding more stages will significantly slow down the waveform at the end of the tree. It is not sufficient to provide only balanced delays in a robust clock network. In addition, waveforms must remain reasonably fast in their switching. Practically, this implies that in an H-tree as is illustrated in FIGS. 1, 2 and 3, wires that are closer to the central drive point 14 of the entire network will have to be wider than those at the load end (clock point 12) of the tree. It may in fact be necessary to provide a level of buffer circuit at some point within the tree to provide waveforms that are adequately fast in their transition times to be useable as clocks at the load ends.
Note again that the fully symmetrical H-tree assumes that clock loads are uniformly distributed throughout the chip, and that it is possible to provide the tree at the exactly desired location. Neither of these assumptions is met in practice. Integrated circuit chips have clock loads that offer non-uniform capacitance, and that are clustered in some areas of the chip and totally absent from others. In addition, the need to provide signal wiring can lead to some regions of the chip being unusable by clock wiring, so that the symmetrical tree may not be possible even if a chip were to have uniformly distributed clock loads. Nevertheless, the general principles of providing balanced delay to small sections of a chip and wire widths that increase as one moves from the load points to the drive point apply in practical realistic clock networks.
A variety of algorithms exist for performing general wiring of integrated circuit chips. The problem to be solved by all of these is generally to connect two or more points using the wiring levels allowed by the semiconductor process without violating design rules, and without erroneously connecting wires that are intended to remain unconnected.
One useful algorithm for performing such routing is the Lee Maze Runner algorithm, whose basic operation is illustrated in FIGS. 4 and 5. In the Maze Runner algorithm, the area of interest is seen as a rectangular grid of cells 22, each of which can accommodate a wire (not shown). The physical rectangle corresponding to a cell 22 is set so that the cell can fit a legal wire and its space. Points to be connected are mapped into cells 22 by position. The next step is finding a path from a starting cell 24 containing a start of a wire to the cell (or cells) containing an ending point 26 (or points) to complete the connection thereto. Pre-existing wires, or wires already made, block the paths of wires that need to be routed through the cells, making them unusable for the current connection being wired.
A typical situation is that illustrated in FIG. 4 which shows the array of cells 22, with the starting cell 24 for a desired wire 26 in the lower left corner, and the ending (or "target") cell 26 for a wire (not shown) in the upper right. Blocked cells 28 are disposed within array of cells 22 either due to prior wiring, or to reserve space that is wire-free.
The Maze Runner stores that information associated with such a situation as (typically) an array of data structures, each of which contains blockage information, and also a "distance" measure, which reflects the distance of the corresponding cell from starting cell 24 for a wire . Initially, this distance is set to some normally illegal value (such as -1) to represent "undefined", and the distance of the starting cell is set to 0. Information specifying which cell 22 is start cell 24 of the wire and which cell 22 (or cells) is (or are) ending or target cell(s) 26 may be kept in the cell data structure or in a separate data structure.
Referring now to FIG. 5, the Maze Runner proceeds in two distinct phases. The first, propagation, determines all cells that can be legally reached from paths starting at the starting cell, and, in addition, determines the shortest distance (in cells) to each such reachable cell. This is shown in FIG. 5 by the numbers in each cell which represent the distance from starting point 24 . The second, back tracking, uses the distance measures determined in propagation to develop the actual path between start and target cells. The backtracked path is indicated by a desired wire path 30.
The propagation phase of the Maze Runner is very simple, and proceeds as follows:
A. Set a "change counter" to zero PA1 B. For each cell 22 in the array: PA1 C. When all cells 22 have been processed, examine the change counter. If it exceeds zero, return to step A; if is it 0, begin back-trace. PA1 A. Beginning at cell C corresponding to ending cell 26. PA1 B. Mark cell C indicating that it is on the wire path desired. PA1 C. If the cell C is starting cell 24, exit. PA1 D. Look at all cells N that are immediately adjacent to each cell C.
a. If the cell is a blocked cell 28, skip to the next cell. PA2 b. If the cell has undefined distance, skip to the next cell. PA2 c. Compute the distance of the cell location from starting cell 24 location (dd=distance+1) PA2 d. Examine all cells N that are adjacent to cell 22: PA2 a: If distance (N)&lt;distance (C), set C to N, and proceed to step B. If more than one neighbor N meets the above condition, pick the one which continues cell-to-cell "movement" in the same direction as the prior "move"; if there is no prior "move", pick a direction arbitrarily.
I. If distance (N) is undefined, set distance (N)=dd, and increment change counter PA3 ii. If distance (N) is defined and exceeds dd, set distance (N)=dd, and increment change counter.
Note that this is guaranteed to stop, and further, will determine all cells reachable from starting cell 24.
The back track phase of the Maze Runner is similarly simple, and proceeds as follows:
Possible results of the back track phase are illustrated in FIG. 5 as desired wire 30.
The Maze Runner algorithm above has been illustrated in a two dimensional context. The typical integrated circuit application of the algorithm is three dimensional, as several "layers" of wiring are typically provided by contemporary technologies. The same basic notions apply, however. In addition, in the three dimensional case, it is possible to achieve certain special results: one might, for example, restrict different layers of wiring to a single X or Y direction only, or, by modifying the concept of distance, might make a give layer mostly X or Y direction, while allowing small "jogs" in the non-preferred direction. Such changes might require minor modification to the back trace phase of the algorithm.
The basic Maze Runner is not directly applicable to clock wiring as it is always attempting to minimize wire length, and has no notion of timing within. It also typically deals with wires of one constant width. Consequently, a need exists for efficient clock routing tools in which wiring distance, power dissipation, delay and skew are all allowed for and optimized. For reasonable delays, such tools further need to be able to accommodate wires some of whose width may be 20-50 (or more) times the width or minimum width wires, while also including some minimum width wires as well. Finally, such tools must be able to set wire widths over a very broad range to accommodate fine tuning of optimized parameters.