1. Field of the Invention
The present invention is generally directed to a method for partitioning and placing components of a circuit design into a programmable integrated circuit device which can be configured to implement the design.
The invention is more specifically directed to a modified placement by partitioning method used for initial or "rough" placement of a circuit design into a field-programmable gate array (FPGA).
2. Description of the Related Art
VLSI Design
Very Large Scale Integration (VLSI) design comprises the steps of circuit design, in which a schematic design resembling a desired circuit is created; and layout, in which an actual VLSI device is planned and produced to perform the function described in the schematic design. The VLSI device may be a custom circuit which is produced on a silicon substrate by wafer fabrication processes, or the VLSI device may be a circuit design which is incorporated into a programmable integrated circuit device (PICD) such as a field programmable gate array (FPGA).
The goal of the layout process is to efficiently construct a device which minimizes layout area and signal propagation delays between associated logic elements. The layout process is generally divided into two separate procedures: placement and routing.
Placement is the assignment of elements of a circuit design to specified areas of a VLSI circuit. The total required layout area and the signal propagation delays between connected elements are considered in the selection of locations for each element.
Routing is the formation of an interconnection network connecting associated elements of the circuit design.
In a simplified (small scale) device layout process, placement and routing processes are relatively simple and can be done manually by a skilled practitioner. However, VLSI design is typically far too complicated for a skilled practitioner to perform un-aided placement and routing efficiently. For this reason, computer-aided design tools have been developed.
Placement by Partitioning
Various software algorithms which place logic into a VLSI device are discussed in "VLSI Cell Placement Techniques", K. Shahookar and P. Mazumder, ACM Computing Surveys, Vol. 23, No. 2, June, 1991 (pages 143-220). The five algorithms identified in this article are placement by partitioning, simulated annealing, force-directed placement, numerical optimization techniques and placement by genetic algorithm. Although two or more algorithms may be used during the layout process of VLSI design, the present invention is concerned with placement by partitioning.
The presently used placement by partitioning algorithms find their root in U.S. Pat. No. No. 3,617,714, entitled "Method of Minimizing the Interconnection Cost of Linked Objects", issued to Kernighan and Lin on Nov. 2, 1971. Also see B. W. Kernighan and S. Lin, "An Efficient Heuristic Procedure for Partitioning Graphs", Bell Systems Technical Journal, Vol. 49, February, 1970, pp. 291-308.
The Kernighan and Lin placement by partitioning algorithm, also referred to as "min-cut" placement, is a numeric algorithm wherein a circuit design is repeatedly partitioned into smaller and smaller groups of constituent elements while the number of nets interconnecting one group to another group is kept to a minimum. In minimizing the number of interconnecting nets, the min-cut algorithm attempts to create an efficient physical layout of the elements for implementation on a VLSI chip.
Partitioning a circuit design may be done from the bottom up or from the top down, or both. Bottom-up partitioning begins with grouping individual elements of a circuit design into larger units. Copending application Ser. No. 07/456,010 attorney docket M-904! describes such a method. Top-down partitioning begins with dividing the entire circuit design into two sections, then four, and so forth until a stop condition is satisfied. The algorithm presented in this application incorporates the latter of these two methods.
The top-down min-cut algorithm first identifies each element of a circuit design, and each element's interconnection with every other element of the circuit design. For instance, an AND gate may be designated as element 1. Element 1 may have two inputs from elements 2 and 3, and have one output to element 4. Each of the interconnections between element 1 and elements 2, 3 and 4 is given a value of one.
The min-cut algorithm begins by arbitrarily partitioning (dividing) the total number of elements of the circuit design into two groups. For instance, as shown in FIG. 5a, if a circuit design has 100 elements, the algorithm would divide the elements into subcircuit groups 1 (elements a.sub.1 to a.sub.50) and 2 (elements b.sub.1 to b.sub.50). A partition "line" PL is defined as an imaginary line disposed between the two groups. Some elements of subcircuit group 1 are typically connected to elements in group 2. For example, element a.sub.3 is connected only to elements a.sub.2 and a.sub.5, while element a.sub.4 is connected to elements a.sub.6 in group 1 and also to elements b.sub.3 and b.sub.5 in group 2. Ideally, if all subcircuit group 1 elements were only connected to other group 1 elements, then efficiency would be maximized because no nets would be cut by partition line PL. However, it is not usually possible to divide the elements of a circuit design without having at least one net which crosses partition line PL to interconnect resulting subcircuit groups. Nets which connect elements of different subcircuit groups, and therefore cross partition lines, are commonly referred to as being "cut" by the partition line. The aim of the min-cut algorithm is to minimize the number of interconnecting nets cut by the partition lines.
After the elements have been divided into two subcircuit groups, an initial count is made of the number of cut nets. For instance, four nets are shown to be cut by partition line PL in FIG. 5a. The algorithm then systematically exchanges each of the elements of the two subcircuit groups, and the number of cut nets resulting from each exchange are counted and stored. After the storage of each cut net count, the elements are returned to their original subcircuit group and a next pair of elements are exchanged. FIG. 5b illustrates an exchange between elements a.sub.4 and b.sub.3. As shown, the calculated cut net count is seven, which is an increase of three cut nets above the initial cut net count of four shown in FIG. 5a. "Gain" is calculated by subtracting the initial cut net count from the calculated cut net count. Therefore, the exchange of elements a.sub.4 and b.sub.3 resulted in a "gain" of +3, which indicates a degradation caused by the exchange. Similarly, FIG. 5c shows an exchange of elements a.sub.1 and b.sub.5. As shown, the resulting calculated cut net count is three, yielding a gain of -1. After every combination of elements has been exchanged, the gains from each exchanged pair of elements are compared and the best gain (lowest calculated cut net count) is identified and stored. The elements which were exchanged to obtain the best gain are then "swapped" between the subcircuit groups and then ignored by the algorithm in the next exchange cycle. For example, if the swap shown in FIG. 5c between elements a.sub.1 and b.sub.5 yielding a gain of -1 is determined to be the best gain, elements a.sub.1 and b.sub.5 would be ignored by the algorithm, leaving 49 elements to be partitioned in each of the two subcircuit groups. The swapping process is then repeated for the remaining 49 "a" and 49 "b" elements in each of the two subcircuit groups. After each exchange cycle, the two exchanged elements yielding the best gain are swapped and then ignored and the best gain is stored. Ultimately, every element in each group is swapped, and a value representing the best gain for each swap is stored.
It should be noted that the swap of elements resulting in a "best gain" may represent a larger number of cut nets than before the swap of elements. For example, the gain resulting from the exchange shown in FIG. 5b may represent a best gain of +3. In this situation, the best gain may be thought of as a "least degradation" value. In any event, the best gain or "least degradation" number is stored as a best gain value. This practice recognizes that some swaps may yield short term increases in the number of cut nets, but subsequent swaps may result in an eventual decrease in the number of cut nets.
The algorithm then compares all 50 of the best gain values from the swapping sequences, and determines which of the 50 swaps resulted in a lowest best gain value. The algorithm then "keeps" all of the swaps occurring before the lowest best gain swap and all swaps occurring after the lowest best gain swap are undone. At this point subcircuit group 1 contains several "b" elements and subcircuit group 2 contains several "a" elements. All original and newly acquired subcircuit group 1 elements are then renumbered as "a" elements and subcircuit group 2 elements are renumbered as "b" elements. At this point the algorithm repeats the exchanging and swapping sequences for all 50 newly designated "a" elements and 50 newly designated "b" elements.
An exchange and swapping sequence which terminates with a lowest best gain value which is zero or positive indicates no swap of elements between subcircuit groups 1 and 2 resulted in fewer cut nets than the number of cut nets prior to the sequence. At this point, the algorithm terminates the task of partitioning the elements of subcircuit groups 1 and 2. The algorithm then arbitrarily partitions each of subcircuit groups 1 and 2, sequentially, into two pairs of subcircuit groups, each having 25 elements, and repeats the exchange and swapping sequences described above for each of the pairs of groups. This process continues until an end condition is satisfied, such as when each subcircuit group contains a predetermined number of elements or each group is connected by a predetermined number of nets. At this point the Kernighan and Lin min-cut algorithm ends.
The original min-cut algorithm is limited in various ways, and numerous modifications have been proposed. One limitation is that the two groups created by a partition are required to contain an equal number of elements. An improved min-cut algorithm developed by C. M. Fiduccia and R. M. Mattheyses modifies the original min-cut algorithm by allowing a selectable imbalance between two subcircuit groups. The Fiduccia/Mattheyses modified algorithm does not swap pairs of elements across a partitioning line but rather picks a single element in one group and moves it to the other group. The algorithm then checks for a decrease in the number of interconnecting nets cut by the partitioning line. The algorithm also checks the imbalance which is created by such a move. If the move creates an imbalance above a predetermined threshold, then it is undone.
Another limitation is that the original min-cut algorithm treats all cut nets as having an equal "cost". That is, every cut net is given a "cost" of one, and the total number of cut nets is simply their sum. However, it is recognized that some nets are more "important" than others. A high fan-out signal such as a clock line might be given low priority while a multiplexer output which is part of a critical path or a carry line between arithmetic digits might have high priority. A modified min-cut algorithm developed by C. Sechen and Dahe Chen assigns a weighted cost to each net. Nets which are determined to be important are given a high cost, for instance, two or five. Nets which are unimportant are given a low cost such as 0.5 or 0.0. The result is that the Sechen/Chen min-cut algorithm recognizes gains which may not be recognized using the original min-cut algorithm.
Sechen and Chen also generate a cost for cut nets that is lower when the pins on the net are unbalanced on the two sides of a cut. This improved cost function leads the min-cut optimizations to move whole nets to one side of the cut line.
Early min-cut algorithms are also limited in that they do not include means for identifying orthogonal (two-dimensional) coordinates for the subcircuit groups created by partitioning. As mentioned above, the layout process of VLSI design involves placement of elements on a two-dimensional silicon substrate or into FPGAs which have a fixed matrix of CLBs. Therefore, simply dividing elements into groups does not identify their location on an X-Y plane.
An improved min-cut algorithm developed by M. A. Breuer assigns X and Y coordinates to the subcircuit groups as they are partitioned. Each sequential partition line dividing a subcircuit group into two or more smaller groups is alternately designated as "vertical" or "horizontal". In addition, each element is assigned associated X-range values (X-lo and X-hi), and Y-range values (Y-lo and Y-hi). For example, prior to any partitioning, all elements may receive X-range values of X-lo=0.0 and X-hi=1.0, and Y-range value of Y-10=0.0 and Y-hi=1.0. Each time a group is partitioned, the partitioning lines designated as "horizontal" divide each group into two subgroups, each subgroup having new Y-range values. Similarly, partition lines designated as "vertical" divide each group into two subgroups, each subgroup having new X-range values.
For instance, assume all elements initially have X-range values of X-lo=0.0 and X-hi=1.0 and Y-range values of Y-lo=0.0 and Y-hi=1.0 prior to the first partitioning cut. If the initial cut is designated "vertical" and divides the design logic into two groups, then the X-range values assigned to the elements of one group are changed to, for example, X-lo=0.0 and X-hi=0.5, and the X-range values assigned to the second group are changed to X-lo=0.5 and X-hi=1.0. Likewise, when each of these two groups is subsequently partitioned, the cut is designated "horizontal" and the two groups are divided into four subgroups with two of the subgroups having Y-range values of, for example, Y-lo=0.0 and Y-hi=0.4, and two subgroups having Y-range values of Y-lo=0.4 and Y-hi=1.0. The subgroups are partitioned independently and their range values may be different. The subgroups are similarly divided until a stop condition is satisfied. When the stop condition is satisfied, the orthogonal coordinates describing the location of each group on the substrate or FPGA is determined by the X- and Y-range within which the group falls.
A problem arising from assigning X- and Y- range values to groups of elements is addressed by a modified min-cut algorithm developed by A. E. Dunlop and B. W. Kernighan, which is commonly referred to as "terminal propagation". The problem is illustrated in FIGS. 6a-6c. As shown in FIG. 6a, initial partitioning of a group of elements results in at least one net n.sub.1 connecting two elements a.sub.1 and b.sub.1 crossing partition line P.sub.1. The problem occurs when subsequent partitioning divides each of these groups into two or more subgroups. Because each exchange and swapping sequence is concerned only with the partition line dividing the two subcircuit groups being considered, the min-cut algorithm fails to account for elements of the two groups which are connected to elements in groups other than the two groups being partitioned. For instance, subsequent partitions may result in the elements a.sub.1 and b.sub.1 being moved to orthogonally remote X and Y positions, as shown in FIG. 6b. Dunlop and Kernighan developed a modified algorithm which addresses this problem by assigning a "dummy" element a.sub.1 ' (shown in FIG. 6c) to a location adjacent the partition line separating elements a.sub.1 and b.sub.1. The dummy element a.sub.1 ' is "connected" by nets n.sub.1 ' and n to elements a.sub.1 and b.sub.1, respectively. The dummy elements represent external pins and cannot be moved because the pins are not considered to be part of the groups being partitioned. As subsequent partitioning occurs, the net n.sub.1 ' prevents the movement of the element a.sub.1 to an X-Y position which is remote from the element b.sub.1, unless sufficient gain results from the movement.
Since the introduction of the min-cut algorithm, a number of improvements and/or variations to its approach have been reported. Some of these improvements are mentioned above. For other improvements, see for example, "Analysis of Placement Procedures for VLSI Standard Cell Layout", Mark Hartoog, 23rd Design Automation Conference, IEEE, 1986, pp. 314-319. See further: "A Class of Min-Cut Placement Algorithms", Melvin Breuer, University of Southern California, 16th Design Automation Conference 1977, pp. 284-290; "Circuit Layout", Jiri Soukup, Bell Labs, Proc. IEEE, vol. 69, October 1981, pp. 1281-1304; and "Optimization by Simulated Annealing", S. Kirkpatrick et al., IBM, Science vol. 220, May 13, 1983, pp. 671-680.