Field Programmable Gate Arrays (FPGAs) are configurable logic devices, which are used to implement varied logic functions. The basic FPGA consists of Look Up Tables (LUTs) and routing paths but a complex one includes dedicated RAM, logic/I/O blocks and other macros like adders, multipliers etc. These FPGAs can be configured to produce a desired output by interconnecting said blocks through routing paths. This process of implementing logic of a design on a FPGA by interconnecting configurable logic devices is known as FPGA mapping. Since being first introduced by XILINX in 1985, the FPGAs have become increasingly popular devices for use in electronics systems. The use of FPGAs continues to grow exponentially because they offer relatively short design cycles, with huge reduction in costs through logic consolidation, and flexibility in terms of configurability.
Several FPGA placement systems have been developed to optimally implement the desired designs on a FPGA. The systems try to map the desired designs onto a grid of configurable logic blocks on a FPGA device of particular size. The net-list for the design, and the Input/Output pad (I/O pad) information of the chip are given as the inputs to the placement system and the system outputs the coordinates for each mapped logic element on a FPGA and interconnection information for logic elements. Over the last few decades, the FPGA placement problem has been solved by various techniques like analytic, partition based and Simulated Annealing. Two parameters of design cycle i.e., delay time and required Silicon area are optimized to get the best possible results. These systems also consider the mapping time and routability of the placed design while mapping the design on a FPGA.
Out of all techniques mentioned here, most FPGA placement systems are based on a variant of Simulated Annealing (SA) as SA usually results in most optimal solution in terms of silicon area and delay time. SA is a Monte Carlo approach for minimizing multivariate functions. The parameter involved in the case of placement is the cost of the placement in terms of wire length or some other factor. The temperature T is a parameter that controls the likelihood of accepting moves that makes the placement worse.
Initially T is very high so almost all moves are accepted. It is then gradually decreased, as the placement is refined so that eventually the probability of accepting moves that makes the placement worse is very low. To apply SA, the system is initialised with a particular random placement. A new placement is constructed by random displacement of the configurable logic devices. If the cost of this new state is lower than that of the previous one, the change is accepted unconditionally and the placement is updated. If the energy is greater, the new placement is accepted probabilistically. This is the Metropolis step, the fundamental procedure of SA. This procedure allows the system to move consistently towards lower cost, yet still ‘jump’ out of local minima due to the probabilistic acceptance of some upward moves.
While various FPGA configurations are currently used, the FPGA devices mentioned herein represent a square grid of configurable elements on a FPGA. For example, a FPGA with device size 17 represents a FPGA having a square grid of configurable logic blocks with 17 rows and 17 columns and routing paths in between. The square arrangement helps in ensuring minimization of wire length in a design and thus reducing the placement cost.
There are two inherent problems with these SA based placement systems. Firstly, when the design size is almost equivalent to the FPGA device size, the present day systems are unable to produce an optimal FPGA placement for large designs within a reasonable amount of time. This is because there is a limited free space available for movement of logic blocks and as a result, the system performs significant amount of switching of logic blocks to find an optimal placement. The other problem is the availability of excess free space for movement of logic blocks when the design size is much smaller than the device size. The larger the device size the more number of moves SA will make and as a result, the system would take more time to find optimal placement. This is observed in FIG. 1 and FIG. 2.
The graph of FIG. 1 gives a very clear idea about the behavior of SA. XorMux8 requires a minimum of device size 17[15×15 for Logic blocks and I/O's on periphery]. The placement cost of SA is low in the region between 25 and 38. For very high device size, when the logic blocks have excess space to move, the SA produces very bad results. Moreover the deviation in the results, or in other words the randomness of SA is also increased for higher device size. The graph has been computed taking an average of mapping cost of five runs for each value of device size. If a design were to run only once for each device size, the deviation would even be larger.
The extra silicon area given to the logic blocks is to allow the SA to have more “moves” in which only a single cell is involved (i.e. no swapping) during the initial phase. This will facilitate SA to have a better intermediate solution compare to one with inappropriate silicon area.
The larger the device size the more number of moves SA will make. For more number of moves SA would take more time. But it is found that SA takes more time for less number of moves and produce a sub-optimal solution when it has very little free silicon area to move i.e. when the design size is almost equal to the device size. This is illustrated in FIG. 2.
As seen in the graph of FIG. 2, the placement time decreases till the device size is around 24 and then increases again. When the device size is near 17 [minimum size for XorMux8] the logic blocks are very tightly packed and hence the each move of SA actually involves two logic blocks (i.e. swapping). As shown in FIG. 3 and FIG. 4 below, almost all moves in FIG. 3 will involve two or more logic blocks where as moves in FIG. 4 will mostly involve one logic block.
Ideally the system should produce an optimal FPGA mapping, but the mapping time for a design adversely affects the system output. As the time passes, the parameter T decreases for SA based systems and as result, the probability of accepting a worse placement also decreases. For big designs the system may accept a sub-optimal placement, as eventually the system may not be able to accept a worse output in the proximity. Hence, eventually the system accepts a sub-optimal placement as an output if better placement is not found in the proximity. The system placement results are illustrated in terms of placement cost and placement time for various device sizes in FIGS. 1 & 2 respectively. As observed, both parameters initially decrease as the FPGA device size increases. However, the parameters start to increase, as the device size gets significantly bigger than the design size. Moreover, the deviation in the results, or in other words the randomness of the output is also increased for higher device size.
The miniaturization of semiconductor technology and increase in the number of logic elements in the present day designs has made the present FPGA mapping systems very slow and unreliable for producing feasible FPGA implementation. All existing techniques produce far from the optimal solution. The present day systems are very slow in providing optimal mapping and hence, are very undesirable for FPGA implementation of designs with large number of logic elements. To truly exploit FPGAs for rapid turn-around development and prototyping, placing of processing elements in proper locations of the device plays an important role in determining the delay and the silicon requirement for FPGA implementation of a design.
Hence, there is a need for systems that minimize the chip area and delay time for a design, and at the same time reduce the mapping time.