1. Field of the Invention
The present invention generally relates to the art of microelectronic integrated circuit layout, and more specifically to the art of placement and routing of cells on integrated circuit chips.
2. Description of Related Art
a. Introduction
Microelectronic integrated circuits consist of a large number of electronic components which are fabricated by layering several different materials on a silicon base or wafer. The design of an integrated circuit transforms a circuit description into a geometric description which is known as a layout. A layout consists of a set of planar geometric shapes in the various layers of the silicon chip.
The process of converting the specifications of an electrical circuit into a layout is called the physical design. Physical design requires arranging elements, wires, and predefined cells on a fixed area, and the process can be tedious, time consuming, and prone to many errors due to tight tolerance requirements and the minuteness of the individual components.
Currently, the minimum geometric feature size of a component is on the order of 0.5 microns. Feature size may be reduced to 0.1 micron within several years. This small feature size allows fabrication of as many as 10 million transistors or approximately 1 million gates of logic on a 25 millimeter by 25 millimeter chip. This feature size decrease/transistor increase trend is expected to continue, with even smaller feature geometries and more circuit elements on an integrated circuit. Larger chip sizes will allow far greater numbers of circuit elements.
Due to the large number of components and the exacting details required by the fabrication process, physical design is not practical without the aid of computers. As a result, most phases of physical design extensively use Computer Aided Design (CAD) tools, and many phases have already been partially or fully automated. Automation of the physical design process has increased the level of integration, reduced turn around time and enhanced chip performance.
The object of physical chip design is to determine an optimal arrangement of devices in a plane and to find an efficient interconnection or routing scheme between the devices to obtain the desired functionality. Since space on the chip surface is at a premium, algorithms must use the space very efficiently to lower costs and improve yield. The arrangement of individual cells in an integrated circuit chip is known as a cell placement.
Each microelectronic circuit device or cell includes a plurality of pins or terminals, each of which is connected to pins of other cells by a respective electrical interconnect wire network or net. A goal of the optimization process is to determine a cell placement such that all of the required interconnects can be made, and the total wirelength and interconnect congestion are minimized.
Prior art methods for achieving this goal comprise generating one or more initial placements, modifying the placements using optimization methodologies including genetic algorithms such as simulated evolution, force directed placement or simulated annealing, described hereinbelow, and comparing the resulting placements using a cost criteria.
Depending on the input, placement algorithms are classified into two major groups, constructive placement and iterative improvement methods. The input to the constructive placement algorithms consists of a set of blocks along with the netlist. The algorithm provides locations for the blocks. Iterative improvement algorithms start with an initial placement. These algorithms modify the initial placement in search of a better placement. The algorithms are applied in a recursive or an iterative manner until no further improvement is possible, or the solution is considered to be satisfactory based on a predetermined criteria.
Iterative algorithms can be divided into three general classifications: simulated annealing, simulated evolution and force directed placement. The simulated annealing algorithm simulates the annealing process that is used to temper metals. Simulated evolution simulates the biological process of evolution, while the force directed placement simulates a system of bodies attached by springs.
Assuming that a number N of cells are to be optimally arranged and routed on an integrated circuit chip, the number of different ways that the cells can be arranged on the chip, or the number of permutations, is equal to N! (N factorial). In the following description, each arrangement of cells will be referred to as a placement. In a practical integrated circuit chip, the number of cells can be hundreds of thousands or millions. Thus, the number of possible placements is extremely large.
Interactive algorithms function by generating large numbers of possible placements and comparing them in accordance with some criteria which is generally referred to as fitness. The fitness of a placement can be measured in a number of different ways, for example, overall chip size. A small size is associated with a high fitness and vice versa. Another measure of fitness is the total wire length of the integrated circuit. A high total wire length indicates low fitness and vice versa.
The relative desirability of various placement configurations can alternatively be expressed in terms of cost, which can be considered as the inverse of fitness, with high cost corresponding to low fitness and vice versa.
b. Simulated Annealing
Basic simulated annealing per se is well known in the art and has been successfully used in many phases of VLSI physical design such as circuit partitioning. Simulated annealing is used in placement as an iterative improvement algorithm. Given a placement configuration, a change to that configuration is made by moving a component or interchanging locations of two components. Such interchange can be alternatively expressed as transposition or swapping.
In the case of a simple pairwise interchange algorithm, it is possible that a configuration achieved has a cost higher than that of the optimum, but no single interchange can cause further cost reduction. In such a situation, the algorithm is trapped at a local optimum and cannot proceed further. This happens quite often when the algorithm is used in practical applications. Simulated annealing helps to avoid getting achieving and maintaining a local optima by occasionally accepting moves that result in a cost increase.
In simulated annealing, all moves that result in a decrease in cost are accepted. Moves that result in an increase in cost are accepted with a probability that decreases over time as the iterations proceed. The analogy to the actual annealing process is heightened with the use of a parameter called temperature T. This parameter controls the probability of accepting moves that result in increased cost.
More of such moves are accepted at higher values of temperature than at lower values. The algorithm starts with a very high value of temperature that gradually decreases so that moves that increase cost have a progressively lower probability of being accepted. Finally, the temperature reduces to a very low value which requires that only moves that reduce costs are to be accepted. In this way, the algorithm converges to an optimal or near optimal configuration.
In each stage, the placement is shuffled randomly to get a new placement. This random shuffling could be achieved by transposing a cell to a random location, a transposition of two cells, or any other move that can change the wire length or other cost criteria. After the shuffle, the change in cost is evaluated. If there is a decrease in cost, the configuration is accepted. Otherwise, the new configuration is accepted with a probability that depends on the temperature.
The temperature is then lowered using some function which, for example, could be exponential in nature. The process is stopped when the temperature is dropped to a certain level. A number of variations and improvements on the basic simulated annealing algorithm have been developed. An example is described in an article entitled "Timberwolf 3.2 A New Standard Cell Placement and Global Routing Package" by Carl Sechen, et al., IEEE 23rd Designed Automation Conference paper 26.1, pages 432 to 439.
c. Simulated Evolution
Simulated evolution, which is also known as the genetic algorithm, is analogous to the natural process of mutation of species as they evolve to better adapt to their environment. The algorithm starts with an initial set of placement configurations which is called the population. The initial placement can be generated randomly. The individuals in the population represent a feasible placement to the optimization problem and are actually represented by a string of symbols.
The symbols used in the solution string are called genes. A solution string made up of genes is called a chromosome. A schema is a set of genes that make up a partial solution. The simulated evolution or genetic algorithm is iterated, and each iteration is called a generation. During each iteration, the individual placements of the population are evaluated on the basis of fitness or cost. Two individual placements among the population are selected as parents, with probabilities based on their fitness. A better fitness for an individual placement increases the probability that the placement will be chosen.
The genetic operators are called crossover, mutation and inversion, which are analogous to their counterparts in the evolution process, are applied to the parents to combine genes from each parent to generate a new individual called the offspring or child. The offspring are evaluated, and a new generation is formed by including some of the parents and the offspring on the basis of their fitness in a manner such that the size of the population remains the same. As the tendency is to select high fitness individuals to generate offspring, and the weak individuals are deleted, the next generation tends to have individuals that have good fitness.
The fitness of the entire population improves with successive generations. Consequently, overall placement quality improves over iterations. At the same time, some low fitness individual cell placements are reproduced from previous generations to maintain diversity even though the probability of doing so is quite low. In this way, it is assured that the algorithm does not lock into a local optimum.
The first main operator of the genetic algorithm is crossover, which generates offspring by combining schemata of two individuals at a time. Combining schemata entails choosing a random cut point and generating the offspring by combining the left segment of one parent with the right segment of the other. However, after doing so, some cells may be duplicated while other cells are deleted. This problem will be described in detail below.
The amount of crossover is controlled by the crossover rate, which is defined as the ratio of the number of offspring produced by crossing in each generation to the population size. Crossover attempts to create offspring with fitness higher than either parent by combining the best genes from each.
Mutation creates incremental random changes. The most commonly used mutation is pairwise interchange or transposition. This is the process by which new genes that did not exist in the original generation, or have been lost, can be generated.
The mutation rate is defined as the ratio of the number of offspring produced by mutation in each generation to the population size. It must be carefully chosen because while it can introduce more useful genes, most mutations are harmful and reduce fitness. The primary application of mutation is to pull the algorithm out of local optima.
Inversion is an operator that changes the representation of a placement without actually changing the placement itself so that an offspring is more likely to inherit certain schema from one parent.
After the offspring are generated, individual placements for the next generation are chosen based on some criteria. Numerous selection criteria are available, such as total chip size and wire length as described above. In competitive selection, all the parents and offspring compete with each other, and the fittest placements are selected so that the population remains constant. In random selection, the placements for the next generation are randomly selected so that the population remains constant.
The latter criteria is often advantageous considering the fact that by selecting the fittest individuals, the population converges to individuals that share the same genes and the search may not converge to an optimum. However, if the individuals are chosen randomly there is no way to gain improvement from an older generation to a new generation. By combining both methods, stochastic selection chooses probabilities based on the fitness of each individual.
d. Force Directed Placement
Force directed placement exploits the similarity between the placement problem and the classical mechanics problem of a system of bodies attached to springs. In this method, the blocks connected to each other by nets are supposed to exert attractive forces on each other. The magnitude of this force is directly proportional to the distance between the blocks. Additional proportionality is achieved by connecting more "springs" between blocks that "talk" to each other more (volume, frequency, etc.) and fewer "springs" where less extensive communication occurs between each block.
According to Hooke's Law, the force exerted due to the stretching of the springs is proportional to the distance between the bodies connected to the spring. If the bodies are allowed to move freely, they would move in the direction of the force until the system achieved equilibrium. The same idea is used for placing the cells. The final configuration of the placement of cells is the one in which the system achieves a solution that is closest to actual equilibrium.
e. Parallel Processing Technique 1
Because of the large number of possible placements, computerized implementation of the placement algorithms discussed above can take many days. In addition, the placement algorithm may need to be repeated with different parameters or different initial arrangements to improve the results.
To reduce the time required to place optimally the cells, multiple processors have been used to speed up the process. In such implementations, multiple processors operate simultaneously to place optimally the cells on the integrated chip. However, such prior efforts to reduce the placement time by parallel processing of the placement methods have been impeded by three obstacles.
First, multiple processors may conflict with each other. This occurs where an area on the chip, which is being processed by one processor, is affected by movements of one or more cells into the area by another processor. When this occurs, one of the two conflicting processors must wait for the other to finish or postpone its own move for later. The area-conflict problem not only lessens the advantage of multiprocessing, but also increases the processing overhead encountered. This is because, before moving a cell, each of the processors must check for area-conflicts with all other processors. As the number of processors increases, the area-conflicts increase rapidly to negate the advantage of multiprocessing, such that the time required to place the cells is increased.
Second, the optimization process can become trapped in a local optimum. To eliminate the area-conflict problem, some systems have assigned particular core areas to each of the processors with the restriction that each of the processors only operate within its assigned area. After processing cells of the assigned areas, the processors are then assigned to different areas, and so on. Although this method eliminates area-conflicts, it limits the movements of the cells to the area assigned to the processor. The limitation on the movement of the cells increases the likelihood of the placement becoming stuck at a local optimum. In the case of a pairwise interchange algorithm, it is possible that a configuration achieved is at a local optimum such that any further exchange within the limited area will not result in a further reduction in cost. In such a situation, the algorithm is trapped at the local optimum and does not proceed further. This happens frequently when the algorithm is used in practical applications, and the extent of the local optimum problem increases as additional processors are added because the increase in the number of processors operating simultaneously reduces the area assigned to each of the processors. Decreases in the area assigned to each of the processors lead to corresponding decreases of the distances the cells of the areas may be moved to improve the optimization.
Third, if multiple processors are used simultaneously to place the cells of an integrated chip, it is possible for the processors to deadlock. This occurs where each of the processors has halted its operation while waiting for another processor to complete its operations. In this situation, all processing is stopped and the system halts. An example of deadlock is where processor P.sub.1 is waiting for processor P.sub.2 to complete its operation, P.sub.2 is waiting for processor P.sub.3 to complete its operation, and P.sub.3 is waiting for P.sub.1 to complete its operation. In that case, neither P.sub.1, P.sub.2, nor P.sub.3 will proceed.
In short, because of the ever-increasing number of cells on an integrated chips (currently at millions of cells on a chip), and the resulting increase in the number of possible placements of the cells on the chip, a computer is used to find an optimal layout of the cells on the chip. Even with the aid of computers, existing methods can take several days to place a large number of cells, and these methods may need to be repeated with different parameters or different initial arrangements. To decrease the time required to place the chip, multiple processors have been used to perform the placement of the cells. However, the use of multiple processors has led to area-conflicts, local optimum problems, and potential deadlock situations, negating the advantages of using the multiple processors.
f. Parallel Processing Technique 2
Alternative to the Parallel Processing Technique 1 discussed above, another technique to implement parallel processing of cell placement algorithms is described below.
The problems associated with the prior art parallelization techniques of assigning regions to multiple processors is illustrated using FIG. 43. The figure illustrates a grossly simplified integrated circuit chip (IC) with four nets 1107, 1109, 1111, and 1113 and four regions 1108a, 1108b, 1108c, and 1108d, each of which has been assigned to a processor.
The first problem is the crossover net problem. If the regions are divided such that crossover nets are created, then the effectiveness of the parallel processing technique is reduced. This is because none of the processors which share the crossover nets can accurately calculate the position of the (which is always the basis for the decision about the cell move) because the other processor may move its cell during the calculation. Naturally, as the number of processors increases, the number of crossover nets increases, aggravating the problem. A large number of crossover nets can be fatal for the convergence of cell placement algorithms. For example, in FIG. 43, nets 1109, 1111 and 1113 are the crossover nets. Some cells of net 1109 are processed by the processor assigned to region 8a while others are processed by the processor assigned to region 1108c. Likewise, the cells of nets 1111 and 1113 are placed by processors assigned to regions 1108a and 1108b, and 1108b and 1108d, respectively.
Second, cell movements from one region (or processor) to another creates communications overhead which may negate the advantages of multiple processor cell placement technique. Each time a cell is moved from one region to another, the processor moving the cell from its assigned region must communicate with the processor receiving the cell to its assigned region. The communication requirement complicates the implementation of cell placement algorithms and slows down both of the communicating processors. As the number of processors, the number of cells, or the number of required cell moves increase, the communication overhead increases. In particular, the performance of the parallel processing technique is especially poor if the spring density levelization method is used as the cell placement algorithm because the algorithm tends to make global cell moves.
Third, to minimize crossover nets and communications overheads, the prior art parallelization techniques typically require a "good" preplacement of the cells on the chip. That is, in order to operate effectively, the prior art methods require the nets to be within a single region and the cells of the nets to be "close" to each other. The best way to achieve this is to increase the region size and decrease the number of processors running in parallel. However, the increase in the region size and the decrease in the number of parallel processors defeat the purpose of parallelizing the cell placement algorithm. Moreover, even with such preplacement of cells, there are generally still many crossover nets.
In order to avoid the problems associated with crossover nets, regions have to be made larger. Use of large regions has the disadvantage in that it limits the number of processors that can be used. In fact, if the entire integrated chip is defined as one region, and only one processor is assigned to place the cells of the chip, then there would be no crossover net problems or communications overhead; but, there also is no parallel processing, and the cell placement becomes a sequential process. Finally, the prior art technique of assigning regions of the IC to each of the multiple processors lead to the problem of unbalanced work load. Because each of the regions may contain varying number of nets, cells, or cells requiring further movements, it is difficult to assign regions to the processors so as to assign equal amount of work to each of the processors. Consequently, some processors finish the placement of the cells of its assigned regions more quickly than other processors, reducing the effectiveness of parallelization of the placement algorithm.
In short, assigning multiple processors have been used implement cell placement algorithms by assigning regions of the IC to each of the processors. However, this technique has lead to crossover net conflicts, interprocessor communication problems, cell preplacement requirements, and uneven distribution of work problems, negating the advantages of using the multiple processors.
g. Floor Plan Optimization
The cost or the desirability of various placement configuration can be measured using other methods such as capacity distribution and utilization ratio. Capacity distribution and utilization ratios measure the placement of the cells for each of the functional blocks for the integrated circuit. An integrated circuit is designed with various functional blocks, or functions, which, operating together, achieves the desired operation. Each of the functions of the circuit is implemented by a plurality of cells and is assigned a portion of the core space upon which the cells are placed. For example, an integrated circuit design may require the use of a central processor unit (CPU) function, memory function, and some type of input/output (I/O) function.
In this Subsection, Subsection 1c-b, Section 3B and in the corresponding claims of this document, the terms and phrases "core," "core space," "core area," "floor," "floor space," and "integrated circuit," will be used interchangeably to refer to the area of the integrated circuit upon which cells are placed to implement various functions of the integrated circuit.
The capacity is the maximum amount of cells which can be placed on the core space or any portion of the core space and is usually measured in cell height units. Provided that entire core space has sufficient capacity, it is often desirable to place the cells on the core space with a certain capacity distribution. For instance, it may be desirable that the cells of the integrated circuit be distributed evenly throughout the chip to avoid high concentration of the cells in a small location with a low concentration of the cells for the rest of the core space. On the other hand, it may be desirable to implement certain functions of the chip on a small portion of the core space with a high concentration of the cells. In sum, a predetermined capacity distribution of the core space or for any function assigned to a portion of the core space may be one of the requirements of the cell placement.
A closely related concept is the utilization of the space. The utilization is the ratio of the amount of the actual core space use within a predefined portion of the core space to the capacity of the core space for the predefined portion of the core space. For example, if a portion of the core space assigned to a function has a capacity of 100,000 cell height units, and the cells to implement the function uses 50,000 cell height units, then the utilization of the portion of the core space is 50 percent.
The capacity distribution or the utilization ratio for each of the functions of the integrated circuit or for the entire core space may be predetermined as an engineering parameter based on such factors as heat dissipation, power management, manufacturing constraints, etc.
The current methods of optimally placing the cells on the integrated circuit involve (1) assigning functions to be implemented to portions of the integrated circuit; (2) placing the cells of each of the functions onto the assigned portion of the integrated circuit using a placement algorithm; (3) calculating the capacity distribution of the integrated circuit and the utilization rate of each portion of the integrated circuit used to implement its function; and (4) iterating the first three steps to obtain a better placement in terms of capacity distribution or utilization.
The disadvantages of the current process involve time and accuracy. Because the placement process requires manual iteration between floor planing tools (to calculate and evaluate capacity and utilization) and placement tools (to newly place the cells onto the core), the optimal placement process takes a long time. Also, is difficult to manually optimize many different parameters simultaneously because, at each iteration, the operator has to simultaneously consider many parameters--overall capacity, capacity distribution, overall utilization, utilization of each functions, utilization distribution, overlap size among functions, aspect ratio of functions, etc. Even with highly experience professionals, the simultaneous consideration of all of the parameters for an optimal cell placement is an extremely difficult process. Further, the complexity of the cell placement process is continually increasing as the number of functions and the number of cells on integrated chips increase, rendering manual analysis techniques to become nearly impossible to perform.
In short, because of the ever-increasing complexity of integrated circuit chips and the number of cells required to implement the functions of the complex designs, the manual placement optimization methods are fast becoming obsolete. The manual floor planning and cell placement optimization process requires an inordinate amount of time because the process requires manual iteration between running floor plan tools and placement tools. In additional, it is extremely difficult, at best, for human beings to simultaneously optimize several parameters (function utilization, overlap size among functions, aspect ratios of functions, etc.).
h. Net Routing
Each microelectronic circuit device or cell includes a plurality of pins or terminals, each of which is connected to pins of other cells by a respective electrical interconnection wire network, or net. A purpose of the optimization process used in the physical design stage is to determine a cell placement such that all of the required interconnections can be made, but total wirelength and interconnection congestion are minimized. The process of determining the interconnections of already placed cells of an integrated circuit is called routing.
Assuming that a number N of cells are to be optimally arranged and routed on an integrated circuit chip, the number of different ways that the cells can be arranged on the chip, or the number of permutations, is equal to N! (N factorial). In addition, each of the cells may require multiple connection points (or pins), each of which, in turn, may require connections to multiple pins of multiple cells. The possible routing permutations are even larger than the possible cell placements by many orders of magnitude.
Because of the large number of possible placements and routing permutations, even computerized implementation of the placement algorithms discussed above can take many days. In addition, the placement and routing algorithms may need to be repeated with different parameters or different initial arrangements to improve the results.
To reduce the time required to optimally route the nets, multiple processors have been used to speed up the process. In such implementations, multiple processors are assigned to different areas of the chip to simultaneously route the nets in its assigned areas. However, it has been difficult to evenly distribute the amount of routing required from each of the multiple processors. In fact, due to the nonlinear algorithm complexity, the obvious, always assumed parallelization which is to split the nets among the processors does not work because routing of one highest fanout net can take much longer than routing of all other nets of the integrated circuit. Such unbalanced parallelization of the routing function has been the norm in the art, leading to ineffective use of parallel processing power.
In short, because of the ever-increasing number of cells on an integrated chips (currently at millions of cells on a chip), and the resulting increase in the number of possible routing of the cells and the nets on the chips, multiple processors are used to simultaneously route the nets of an integrated chip. However, even with the aid of computers, existing methods can take several days, and the addition of processors may not decrease the required time because of the difficulties of balancing the amount of work between the processors.
i. Other Considerations
The problem of cell placement is compounded by external requirements specific to each individual integrated circuit chip. In conventional chip design, the positions of certain "unmovable" cells (external interconnect terminals or pads, large "megacells" etc.) are fixed a priori by the designer. Given those fixed positions, the rest of the cells are then placed on the chip. Since the unmovable cells and pads are located or placed before the placement for the rest of the cells of chip has been decided on, it is unlikely that the chosen positions will be optimal.
In this manner, a number of regions, which may have different sizes and shapes, are defined on the chip for placement of the rest of the cells.
It is desirable to assign individual microelectronic devices or cells to the regions, or "partition" the placement such that the total interconnect wirelength is minimized. However, methodologies for accomplishing this goal efficiently have not been proposed heretofore.
The general partitioning methodology is to hierarchically partition a large circuit into a group of smaller sub-circuits until each sub-circuit is small enough to be designed efficiently. Because the quality of the design may suffer due to the partitioning, the partitioning of a circuit requires care and precision.
One of the most common objectives of partitioning is to minimize the cutsize which is defined as a number of nets crossing a cut. Also the number of partitions often appears as a constraint with upper and lower bounds. At chip level, the number of partitions is determined, in part, by the capability of the placement algorithm.
The prior art accomplishes partitioning by means of a series of "bipartitioning" problems, in which a decision is made to assign a component to one of two regions. Each component is hierarchically bipartitioned until the desired number of components is achieved.
Numerous alternate methodologies for cell placement and assignment are known in the art. These include quadratic optimization as disclosed in an article entitled "GORDIAN: VLSI Placement by Quadratic Programming and Slicing Optimization", by J. Kleinhans et al, IEEE Trans. on CAD, 1991, pp. 356-365, and simulated annealing as described in an article entitled "A Loosely Coupled Parallel Algorithm for Standard Cell Placement", by W. Sun and C. Sechan, Proceedings of IEEE/ACM IC-CAD Conference, 1994, pp. 137-144.
These prior art methods cannot simultaneously solve the partitioning problem and the problem of placing partitions on the chip, and thus the applicability of such methods to physical design automation systems for integrated circuit chip design is limited.
More specifically, prior art methods do not provide any metric for specifying distances between cells based on netlist connections. An initial placement must be performed to establish physical locations for cells and thereby distances therebetween.
Also, prior art methods fix cells in clusters at the beginning of optimization, and do not provide any means for allowing cells to move between clusters as optimization proceeds. This can create areas of high routing congestion, which cannot be readily eliminated because cell movements between clusters which could relieve the congestion are not allowed.
In summary, the problem inherent in these prior cell placement methods is that repeated iterations generally do not tend to converge to a satisfactory relatively uniform overall cell placement for large numbers of cells. The aforementioned methods can take several days to place a large number of cells, and repeating these methods with different parameters or different initial arrangements may not necessarily provide improvements to cell placement. Typical methods for using these designs involve using a chosen method until a particular parameter, for example wire length, achieves a certain criteria or the method fails to achieve this criteria for a predetermined number of runs. The results are inherently non-optimal for other placement fitness measurements, having optimized the method based only on a single parameter. Further, results of these placement techniques frequently cannot be wired properly, or alternately, the design does not meet timing requirements. For example, with respect to simulated annealing, setting the temperature to different values may, under certain circumstances, improve placement, but efficient and uniform placement of the cells is not guaranteed.