This invention relates to optimization of microelectronic circuit designs, and more particularly to timing optimization in highly complex ASIC or microprocessor designs having a very large number of gates and a correspondingly very large number of circuit paths.
In today's ASIC or microprocessor designs it is very common to have chips that vary in size from several hundred thousand gates to several million gates. After placement these same chips have hundreds of thousands of paths where the calculated arrival time for a signal is greater that the required arrival time—a condition known in the art as negative slack. The optimization of these paths is a time-consuming operation in the order of several hours to tens and hundreds of hours. Therefore, finding ways to reduce the optimization time without degradation of results will significantly impact the development cycle of a design and reduce the time required to bring a new product to market.
It is desirable to perform partitioning of the chip design pattern, so that separate processors may work in parallel to analyze and optimize the timing of the circuit paths. For example, the circuit of FIG. 1A has a group of logic gates 1, and a timing path between timing points 2 and 3. A given circuit in general has multiple paths, with two given paths having either an endpoint in common or different endpoints. For example, in FIG. 1B three paths 4a, 4b, 4c have timing point 5 as an endpoint, and two paths 4a, 4b have timing point 6 as an endpoint. The two paths 7a and 7b, ending at point 8, are not physically connected to paths 4a–4c; the timing of paths 7a and 7b may thus be optimized in parallel with paths 4a–4c. 
Conventional approaches to geometric partitioning and timing optimization will be described briefly as follows.
Geometric Partitioning
Given a graph G(V,E), where V is the set of weighted vertices and E is the set of weighted edges, the traditional partitioning problem is to divide the set V into k subsets, such that the number of edges straddling two partitions (edge cut) is minimized while the sum of vertex weight in each partition is balanced.
This problem is non-polynomial-complete; heuristic techniques have therefore been developed. Iterative-move based Kernighan-Lin (KL) and Fiduccia-Matthews (FM) algorithms were introduced in 1970's and 1980's. These algorithms work in iterations after an initial rough partition. In each iteration, vertices are moved from one partition to another or swapped between partitions, if the move/swap reduces the edge cut. As is known to those skilled in the art, these iterative heuristics are greedy algorithms, so they are heavily dependent of the initial partitioning, and likely trapped in a local optimum. The initial partitioning is typically done by arbitrary vertex selection or using breadth-first search methods.
Spectral partitioning algorithms such as those described by P. K. Chan et al. and K. M. Hall, on the other hand, attempt to treat the global view of graph. These algorithms first find and use the eigenvectors of a matrix representation of a given graph. The calculation complexity of eigenvectors increases quickly when the number of vertices increases. Therefore, this method is not directly used in large graphs.
Current partitioning schemes focus on multi-level techniques such as those described by G. Karypis et al. In multi-level partitioning, the initial set of vertices is grouped into sub-sets, and each sub-set is assigned a vertex. The grouping process is repeated for the new set of vertices. The partitioning process finishes when the graph size becomes small enough to be handled easily. After a good solution is found for the small graph, the graph is expanded iteratively into original graph. At each step of this uncoarsening step, the partition boundary is refined using modified FM algorithms, and the refinement step mainly determines the quality of the solution. Metis and hMetis from the University of Minnesota are widely-used public multi-level partitioning programs.
If more than one weight is associated with a vertex, then the problem is called multi-constraint partitioning problem, and the objective is to divide each weight evenly amongst partitions. For example, the weights could represent circuit element properties such as area and power, and one may seek a partitioning result where each partition has approximately the same amount of area and power. This is known as a 2-constraint partitioning problem. The Metis family of programs supports m-constraint problems.
All the above algorithms are generally applied for the netlist representation of circuit before placement, and each graph vertex does not have geometric information. Hence, if applied on a placed netlist, each partition can be geometrically disconnected. Geometric partitioning is partitioning for those graphs, whose vertices have geometric locations, and generates partitions which are geometrically connected.
Geometric partitioning on a mesh graph has usually focused more on balancing than net-cut minimization. Two-constraint graph partitioning has been suggested using a so-called Ham-Sandwich theorem, as discussed by J. M. Kleinhans et al. and A. Poe et al. Its suggested partitioning separators are a line with arbitrary slope, and the worst complexity can be O(n2). Two-constraint geometric graph partitioning with L-shape separator has been studied by one of the inventors, where the algorithm is applied to standard-cell placed circuit with O(n ln n) complexity.
More recently, C. Ababei et al. have described a timing-driven partitioning algorithm for a pre-placement design, in which a subset of the most critical paths is identified and optimized. However, this approach does not address the problem of post-placement optimization, or of optimization by processors running in parallel.
Timing Optimization
After initial placement, a timing analysis is run on the placed circuit and a list of timing paths is obtained, for example by using a timing analysis tool such as EinsTimer™. A timing path is an ordered sequence of timing points between two Significant Timing Points. A Significant Timing Point is a point where a timing goal is defined; in other words, it is any point in the design where timing information is asserted and therefore does not change with timing analysis. Examples of Significant Timing Points are: the input, output and clock pins of latches/registers, the primary inputs and outputs of the design, etc. Timing paths are classified in terms of slack which is an indication of how each point in the path is from its goal. The slack of a timing point (Stpi) is defined as Stpi=Required Arrival Time (RATtpi)−Arrival Time (ATtpi). If the slack is zero the point reached its goal, if it is positive the point is beyond the goal and if it is negative the point lacks the goal. The list of paths is ordered by slack, the paths with most negative slack being the first ones in the list. The negative slack paths indicate the areas in the design that require optimization to make the slack positive. These paths will be referred to as critical paths or timing critical paths.
Timing optimization is a step in the chip design process where timing critical paths are optimized such that no path is below a given target slack, usually zero. This step generally involves applying optimization techniques to the gates and interconnects (nets) in a timing path such that the overall delay is reduced. Examples of these optimization techniques are changing the size of gates; inserting buffers/inverters on nets; swapping pins between equivalent nets; gate cloning; decomposing gates into logic equivalents; etc. Timing optimization engines, such as PDS_refine™, use these techniques in several ways to achieve the desired results.
In order to reduce the long running times for circuit design optimization procedures, it is desirable to perform optimization in parallel to the greatest extent possible. However, conflicts may arise when attempting to optimize placed circuits, due to the lack of timing independence or physical independence of many circuit paths. As may be already seen from the highly simplified example of FIG. 1B, some timing paths occupy the same space, while other timing paths are physically independent. This means that optimization of some paths should not be performed in parallel, but instead should be assigned to the same processor.
There remains a need for a post-placement timing optimization procedure in which optimization may be performed by parallel processors.