The problem of timing closure of Very Large Scale Integrated (VLSI) chips or integrated circuits involves the combination of logic synthesis algorithms with placement and routing algorithms in order to meet timing, area, and other design objectives for the chip. Logic synthesis algorithms change the type and connectivity of circuits used to implement the functionality of the chip. Placement algorithms alter the physical locations of the circuits on the chip. Routing algorithms modify the wire type and path of the connections between the circuits. As the size of the VLSI chips grows, the problem of timing closure increases correspondingly at a geometric rate. A hierarchical chip optimization process limits the run time required to achieve timing closure. Partitioning the problem along hierarchical boundaries reduces the individual problem size, while allowing smaller problems to be solved in parallel.
This optimization process operates in a distributed manner by taking advantage of data parallelism. Since timing closure is a global problem, it provides unique difficulties when attempting partitioning. One of the goals of timing closure is to ensure that the chip operates at the desired frequency. The frequency of a chip is limited by the transmission delay through the longest path of circuits on the chip. Partitioning the problem along hierarchical boundaries usually produces circuit paths that traverse multiple partitions. Such a path presents, by definition, a global challenge. Similarly, each circuit in the entire chip hierarchy may be placed at any point on the chip image. While the problem of size may be reduced by partitioning along hierarchical boundaries, physical and timing resources must still be managed for the entire chip.
The chip is usually partitioned using logical hierarchy or physical hierarchy. The logical hierarchy is typically expressed in the original Hardware Description Language (HDL) used to describe the functionality of the chip. The physical hierarchy is created by flattening the chip and performing an initial placement of the integrated circuits. From this placement, the chip is carved up in such a way that the partitions are regular interlocking shapes on the chip image. Each method has certain advantages and disadvantages. Retaining the logical hierarchy makes it easier for the chip designer to understand the current state of the chip. It also allows the designer to make changes in the original HDL and only re-optimize the partition in the hierarchy that contains the change. While using a physical-centric partitioning strategy requires a complete re-optimization of the entire chip in the event of an HDL change, it does present appreciable benefits. For example, partitioning along the original logic hierarchy (also called floor planning) may limit the quality of the placement optimization algorithms.
FIG. 1 describes a circuit path that traverses several hierarchical boundaries. The floor planning shown presents a long path from A to C regardless of where the circuits are placed within their respective partitions. Once the chip is partitioned into a given hierarchy, boundaries are referred to being either soft or hard depending upon the management of the physical and timing resources. Hard hierarchical boundaries imply that once the resources are distributed among the partitions they remain unchanged throughout the optimization procedure. Soft hierarchical boundaries start with an initial distribution and allow the distribution to be updated to facilitate timing closure. The main benefit of hard hierarchical boundaries is that the entire closure problem is broken down into completely encapsulated sub-problems. This reduces the complexity of the sub-problems since the physical and timing constraints remain constant, obviating the need for any communication among the partitions. This distribution of resources is in the form of a disjoint set. For example, circuits of separately optimized partitions occupy disjoint physical regions of the chip. However, an optimal initial distribution of resources into disjoint sets along partition boundaries cannot be exactly determined up front. Employing soft hierarchical boundaries allows the optimization procedure to redistribute physical and timing resources as necessary given changes to the state of the design. Allowing flexibility in the allocation of resources improves the quality of the design.
U.S. Pat. No. 5,877,965 to Hieter et al. “Parallel hierarchical timing correction” (PHTC) describes a distributed method of timing closure wherein each execution process operating in parallel receives a copy of all the partitions in the chip hierarchy. The problem of timing closure is distributed by virtue of the fact that each of the parallel processes optimizes a different partition in the hierarchy. Therefore, even though each parallel process receives the entire hierarchy containing all partitions, no two processes work on the same partition. While each process begins with a replica of the initial state of the entire chip hierarchy, over time, the partitions for which the process is not responsible (those that are read-only) become stale. Each process only operates on one partition, leaving the remaining partitions in the hierarchy unchanged. Since timing closure is a global problem, decisions made in one partition of the hierarchy usually affects other partitions. Therefore, the initial state of the entire hierarchy that was given to each process at its inception may no longer be trustworthy as the work of timing closure is pursued. To overcome this difficulty, an individual process will periodically export a copy of the partition in the hierarchy for which it has write-access to a database. After export, the process searches the database for updated partitions from other parallel processes. If the process finds a partition in the database that is more recent than the one in the current replica of the chip hierarchy, then it will import this partition into its replica. If the periodicity of exporting and importing is frequent enough, then each process has a reasonably accurate view of the current state of the entire chip hierarchy. Thus, this method allows each parallel process to have a global view of the timing graph for the entire chip hierarchy which is necessary for timing closure.
However, the aforementioned prior art suffers from significant drawbacks. While prior art methodology distributes the workload along hierarchical boundaries to parallel processes, each process must be executed on computing resource sufficiently powerful to load the entire chip hierarchy despite the fact that only one portion of the hierarchy is being modified. As the size of VLSI chips continues to grow, this method will be limited to very expensive powerful servers. Clearly, it would be preferable to load only one partition and abstract the impact of the remainder of the hierarchy in some fashion.
Additionally, prior art methods for transmitting optimization changes to a partition is very coarse. Regardless of the extent of the changes to a particular partition, the process will periodically write out the entire partition. Since partitions are only exported to express the updated timing graph, it would be more efficient to only communicate the changes to the timing graph at the boundaries of the partition.
Finally, the prior art does not teach the use of shared physical resources. For the purposes of physical resources, the prior art employs a hard hierarchical boundary paradigm. While PHTC may include algorithms that modify placement data, each partition is limited to the physical resources it was initially given.
U.S. Pat. No. 6,202,192 to Donath et al. “Distributed Static Timing Analysis” and U.S. Pat. No. 5,602,754 to Beatty et al “Parallel execution of a complex task partitioned into a plurality of entities” describe methods to distribute the procedure of static timing analysis on a hierarchical chip. Each partition in the hierarchy is analyzed in a separate process. In order to build a complete timing graph of the entire hierarchy, the processes communicate timing information to each other regarding signals that cross hierarchical boundaries.
For example, referring to FIG. 2, there is shown three partitions P_TOP, P_A, P_B, each analyzed in a separate process. Partitions P_TOP and P_A intersect at a point on the hierarchical boundary, A_IN. At this point, P_TOP communicates information, such as signal arrival time to P_A. In a similar fashion, P_A communicates information, such as the capacitive load of wire W to P_TOP.
With regard to timing resources, the prior art describes a method of maintaining soft hierarchical boundaries. However, it does not cover the management of physical resources during distributed optimization. While the prior art is an essential component in the system that performs timing closure on a hierarchical design, since it does not teach the reallocation of physical resources, it requires that the physical resources remain static throughout optimization. This severely limits the optimality of the final result.