Simulated annealing ("SA") is a method of optimizing properties of large, complex systems. Essentially, SA is optimization via random search. New states in the system are chosen at random, then compared to the current state and either accepted in place of the current state, or rejected. This random search continues until the system has converged to a final state that is acceptable based on a problem-specific cost function.
SA derives its name from a useful analogy drawn from the annealing of solids involving melting a substance, then carefully cooling it until a highly regular crystalline structure is formed. Making an analogy to the optimization of large, complex systems (such as the placement and routing of circuit elements in an FPGA), the cost function that describes the quality of a particular state of the system (implementation efficiency, or quality of the place and route solution) corresponds to the energy in the substance being annealed, variables in a system correspond to atoms in the substance being annealed, and the minimum in the cost function discovered during the optimization process corresponds to the highly regular crystalline structure that results from annealing.
The key innovation in SA was the introduction of effective temperature, T, into complex system optimization via random search. As with annealing of solids, the simulated annealing process begins with a high effective temperature and slowly reduces it. When the effective temperature has decreased such that the system being optimized has settled into a minimum in the cost function, C, the optimization process is complete.
Given an initial problem state, X.sub.0 and an initial temperature, T.sub.0, a pseudo-code description of an available SA algorithm is shown in FIG. 1.
At the core of the algorithm shown in FIG. 1 are the two functions generate and accept. How the annealer generates a new problem state given the current state is encoded in the function generate (on line 4 of FIG. 1). Function accept on line 5 determines whether the newly generated problem state replaces the current problem state. Using SA, new states are accepted not only when they cause a decrease (i.e., an improvement) in the cost, but, with a probability based on the temperature, also when they cause an increase in the cost. The probability of acceptance of states that increase cost lowers as temperature decreases. This behavior is codified in the accept function for SA, which is defined in Equation 1: ##EQU1##
In Equation 1, random is a function that returns a uniform random number on the interval [0,1]. The behavior of Equation 1 is shown conceptually in FIGS. 2A and 2B. Downhill moves are moves to a new state with a lower cost than the present state, while uphill moves increase the cost. For a downhill move, C(x)-C(x+.DELTA.x) is positive, and the exponent in Equation 1 will be positive, so the function on the right of the inequality will yield a value larger than one. Since the range of random is [0,1], these downhill moves are accepted. The probability of accepting an uphill move depends on the ratio of the change in the cost function to the present temperature. For very hot temperatures, this ratio is small (the denominator in the exponent is far greater than the numerator) and almost all uphill moves are accepted. When the temperature has cooled such that it is only warm (FIG. 2A), the probability of accepting a large uphill move is substantially reduced, but many smaller uphill moves are still accepted. When the temperature is reduced further to cool, the probability of accepting uphill moves also decreases, as illustrated in FIG. 2B. This probability continues to drop until almost no uphill moves are accepted. At this point, the problem settles into the bottom of a valley in the cost surface and is considered to be frozen. Hot, warm, and cool temperatures are problem-specific; i.e., different systems will accept or reject states at different temperatures.
Returning again to FIG. 1, the functions frozen, update_temp, and done_at_temperature determine when annealing is finished, how the temperature is updated, and how many moves should be performed at each temperature, respectively. The initial temperature, T.sub.0, and these critical control functions are collectively referred to as the cooling schedule. Optimal cooling schedules have been the subject of much research and selection of an appropriate schedule is crucial to an efficient implementation. However, discussion of cooling schedules is beyond the scope of this background discussion and will be understood by one skilled in the art to which the present invention pertains without additional discussion.
Any SA solution to a specific problem can be described in terms of the four principal components of the algorithm:
x, the problem representation, which maps the problem being solved into the current state within the annealer; PA1 generate, which implements the annealer's move-set (the set of possible perturbations referred to in Equation 1 as .DELTA.x, used by the annealer to manipulate the current state; PA1 C(x), the cost function, which determines how the cost is calculated and what its components are; PA1 and the cooling schedule, which controls T, directing the overall cooling process. PA1 Problem representation: The problem is represented as the set of (x,y) co-ordinates that comprise the LB locations for all components in the user's design. Initial locations are assigned randomly for all the components, ensuring simply that the locations are legal (e.g., each location is unique). Thus, throughout the SA run, a component will always have a valid LB location. PA1 Move-set: The usual move-set typically contains only random swaps. A random swap first selects two LB locations randomly, then swaps the components at those locations. (One of these locations can be empty, with no ill effect). PA1 Cost function: The typical goal for a good implementation of a user's design on an FPGA is to minimize the interconnect delay (a function of distance and complexity of the interconnect path) between components. The actual interconnect delays cannot be known until the FPGA is routed, which is not performed until after placement. As a result, the placer's cost function simply estimates delay. PA1 Cooling schedule: Cooling schedules can be formulated that are independent of the particular problem domain. Assuming that such a schedule is used, further discussion of cooling schedules is not relevant to this background discussion.
The most important decision to be made when designing an annealing implementation is how to handle constraints inherent in the problem being solved. For example, when trying to optimally place transistors on an integrated circuit (IC), the final solution is constrained such that none of the transistors can overlap. One way to handle these constraints within an annealing formulation is to penalize problem states that violate these constraints, by including penalty terms in the cost function. These penalty terms can be weighed against the other design objectives by using scalar weights, resulting in a weighted penalty function. However, not all constraints need be handled with a weighted penalty function; many constraints can be implemented either in the cost function or in the move-set. For example, simple bounds on the decision variables can be easily implemented in the move-set by simply not generating moves beyond the variable bounds. However, for more complex constraints, implementation is not so straightforward. In the case of optimal transistor placement on ICs, the constraint that transistors cannot overlap can be implemented by designing the move-set such that they never overlap, or by designing a cost function that penalizes illegal overlaps. Although the move-set approach is the most obvious solution, the cost function solution is easier to implement and is generally accepted as superior. The designer of any SA solution to a realistic problem will be faced with similar design trade-offs.
Annealing for FPGA Placement
The placement problem in FPGAs is often solved with SA. An FPGA is a regular, two-dimensional grid of logic blocks ("LBs"). Each LB can be programmed to perform a single, small function. Programmable routing connects the LBs.
The initial state of the user=s design and FPGA are shown conceptually in FIG. 3. The user's design is on the left and is a collection of interconnected gates. The grid on the right shows the empty FPGA, where each square (defined by a unique coordinate pair) is an LB. The routing structure is not shown. A user's complete digital design is typically implemented on the FPGA in three steps: mapping, placement, and routing.
The mapping step comprises collecting gates into interconnected pieces called components. The results of the mapping step are shown in FIG. 4. Each component is small enough to fit within a single LB.
Placement determines which LB in the FPGA to use for each component. The results of the placement step are shown in FIG. 5. The placement problem starts with a set of unplaced components. When complete, placement produces an (x,y) location for each component. These (x,y) locations are the grid co-ordinates of the LB that the component will occupy on the FPGA.
The final step, routing (not shown), determines how to program the programmable interconnect structure of the device such that the connections on the FPGA are compatible with those in the user's design.
In SA terms of move-set, cost function, and cooling schedule, the FPGA placement problem is typically formulated as follows:
Architecture-Specific Feature Utilization
One approach to avoiding complicated routing structures in FPGAs is the inclusion of special structures called direct connects between logic elements. For example, in the XC3000 device from Xilinx, Inc., assignee of the present invention, each LB connects to the four LBs to its north, south, east and west, as illustrated in FIG. 6. The X output may be connected directly (i.e., through only one programmable interconnect point (PIP)) to the B input of the LB immediately to its right and the C input of the LB immediately to its left. Similarly, the Y output may be connected directly (through only one PIP) to the D input of the LB immediately above and the A input of the LB immediately below. Similarly, the Xilinx XC4000EX includes four direct connects per LB: two vertical and two horizontal. A simplified view of this structure is illustrated in FIG. 7. Horizontal direct connects 4 connect subject LB 2 to adjacent LB 6 on the right, and vertical direct connects 8 connect subject LB 2 to LB 10 adjacent below. A new direct connect structure is also disclosed by Ochotta and Wieland in U.S. Pat. No. 6,069,490 entitled ROUTING ARCHITECTURE USING A DIRECT CONNECT ROUTING MESH, which is incorporated herein by reference.
There have been a number of approaches taken to FPGA placement utilizing simulated annealing. To attempt to improve utilization of special architectural features such as direct connects, three kinds of techniques have been used.
A first approach is to modify the cost function to encourage direct connects and other architecture-specific features to be used. In this manner, placements that use these architecture-specific features are assigned low cost, so that the normal annealing algorithm will tend to create placements that use these features. However, with a completely random move-set, the probability that a significant number of desired, architecture specific features will be used remains quite small.
A second approach is to recognize patterns in the user's design and place some components in locally optimal locations before SA is executed. The pre-placed components are not moved during annealing. One problem with this technique is that the pre-placed components may be optimally placed with respect to each other, but may be non-optimal in the context of the overall design.
A third available approach is to perform annealing as usual and then attempt to optimize the final placement to use more architecture specific features. The problem with this technique is that the locations needed to use the architecture-specific features may be occupied by other uninvolved components. Finding new locations for the displaced components is very difficult, and the new locations for those components may displace yet more components. In fact, in the general case, using this technique is tantamount to starting the entire placement process over again.
Thus, key to effective utilization of architecture-specific features such as direct connects is awareness of their existence and location. Second, and just as important, is proper and advantageous use of the direct connects, since an implementation of a design is only as fast as the slowest path. For example, if the critical path for a signal is actually an 8-bit bus, all 8 bits in the bus must take similar advantage of direct connects. No advantage is gained if one of the bits uses direct connects to be faster than the other seven bits, since all bits on the bus must be received before processing can continue.
Direct connect utilization is most effectively addressed during placement, since it is assumed that if logic blocks are placed so that they can be routed with direct connects, the router will find this direct connection without any difficulty. Unfortunately, when using optimization techniques such as available simulated annealing algorithms, placing components so that they can take advantage of direct connects is fairly unlikely. This is particularly true when there are only two possible relative placements of a pair of blocks that can use a direct connect. Because available SA algorithms move blocks randomly, trying to find one of only two ideal solutions in a large number of random perturbations makes SA a highly inefficient mechanism for utilizing valuable device resources.
We can therefore see that architecture-specific features such as direct connects are used only under very specific circumstances and SA by its nature is not effective for finding special cases. For example, a direct connect can be used only when a component and its signal driving mate are placed into directly connected LBs. Given the number of possible valid placements of components in LBs on a real FPGA, random placements would rarely utilize direct connects. Given that the standard SA algorithm is driven by random moves, direct connect utilization in placement done by available SA algorithms is likely to be poor.
Thus, efficient and effective utilization of direct connect resources in FPGAs requires improvements in the available art.