1. Field of the Invention
The present invention relates to the field of electronic circuit design. More specifically, a method and apparatus for element placement in the context of placement problems for standard or custom cells, field programmable gate arrays (FPGAs), programmable systems on chip (PSoC) or multiprocessors are disclosed.
2. Description of the Prior Art
The most time-consuming operation in the design automation flow from a hardware description language representation of a digital circuit to an FPGA programming bitstream is the placement step. Large designs can have placement runtimes of hours or even days for modern multimillion user-gate devices. Software algorithms and workstation capabilities are not improving fast enough to keep up with the exponentially increasing number of resources available on FPGAs.
Placement is a NP-complete problem. A widely used approach is simulated annealing, as disclosed, for example in S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by Simulated Annealing,” Science, vol. 220(4598), pp. 671–680, 1983. Another well known approach is the force-directed algorithm disclosed in S. Goto, “An efficient Algorithm for the Two-Dimensional Placement Problem in Electrical Circuit Design,” IEEE Transactions on Circuits and Systems, vol. CAS-28, pp. 12–18, 1981. Force-directed algorithms can give acceptable results, but often terminate trapped in local minima.
Most placers are designed to execute on sequential uniprocessors. Within the domain of fast placers, there are three different approaches to the problem. Most commonly, traditional, sequential software is optimized for substantial speed increased. Less commonly, placement software is parallelized over some small number (less than a dozen) of microprocessors. Rarely, approaches to the placement problem are seen that involve a very large number of processing elements.
Parallel approaches are disclosed, for example, in U.S. Pat. No. 5,144,563 to Date et al. and U.S. Pat. No. 5,796,625 to Scepanovic et al.
Prior art schemes attempting to use a very large number of processing elements are the schemes developed by Banerjee, Horrvath, Shankar, Pandya, and Chyan, Breuer.
To accelerate force-directed placement, a scheme to assign one processor element to each cell of an ASIC design is described in P. Banerjee, “Parallel Algorithms for VLSI Computer-Aided Design,” Chapter 3, Englewood Cliffs, N.J.: PTR Prentice Hall, 1994, and E. I. Horvath, R. Shankar, and A. S. Pandya, “A Parallel Force Directed Standard Cell Placement Algorithm,” Technical Report Dept. Computer Science, Florida Atlantic University, Boca Raton, Fla., 1992. Unfortunately, this design mostly depends on a large-scale supercomputer. D. J. Chyan and M. A. Breuer, in “A Placement Algorithm for Array Processors,” presented at the ACM/IEEE Design Automation Conference, Miami Beach, Fla., 1983 envision a force-directed, systolically interconnected placement engine with one processing element per module. However, also the Chyan-Breuer algorithm is trapped in local minima.
Prior art schemes are not able to achieve both high quality and large speedups. The attempts for large speedups with large number of processors fall short in quality and are highly sequentialized by the schemes used to communicate updates among processors. Attempts to achieve high quality with simulated annealing either have limited quality or limited speedup. None of the prior art schemes teaches how to employ large numbers of processors profitably to achieve large speedups, high quality, and avoid performance bottlenecks in communications.