1. Technical Field
Embodiments of the present invention relate to allocating resources of a field programmable gate array (FPGA) to implement functions that include multiplication operations.
2. Related Art
An FPGA is an integrated circuit chip that includes components such as programmable input/output buffers (IOBs), configurable logic blocks (CLBs), block random access memory (BRAMs), and a programmable interconnect circuitry for interconnecting the IOBs, CLBs and BRAMs. The FPGAs further include static random access memory (SRAM) configuration memory cells that can be programmed to configure the logic in the IOBs, CLBs and BRAMs. The SRAM configuration memory cells are typically programmed at startup of the FPGA, but can be reprogrammed using a partial reconfiguration process during operation of the FPGA by programming frames or a number of columns of the SRAM memory cells at a time.
The CLBs include a number of look up tables (LUTs) typically made up of components such as multiplexers and SRAM memory cells. At configuration, a bitstream is provided to program the individual SRAM memory cells to set the state of each LUT with a desired function by writing the truth table of the desired function to the individual SRAM memory cells. Each LUT implements a logic function with n inputs that select an output depending on how the SRAM memory cells are programmed or configured. Logic functions may use all n inputs to the logic element or may use only a subset thereof. A few of the possible logic functions that an LUT can implement are: AND, OR, and XOR gates. LUTs can be programmed to perform other functions such as an adder or multiplier.
Some FPGAs include dedicated components that provide programmable features in addition to the LUTs. For example, a digital signal processor (DSP) can be provided on board the FPGA when typical users are expected to build a number of DSPs using the LUTs. Dedicated DSPs will use less logic and chip space than programming a number of LUTs to form the DSP. DSPs can form large multipliers more efficiently than comparable LUTs. Similarly, multipliers can be formed using a multiplexer/carry (MUXCY) dedicated circuit(s) in the FPGA which may enable large multiplication operations to be formed with less resources than using a number of LUTs to create the same large multiplier. Although DSPs and multiplexer/carry devices are described, other components can likewise be included on the FPGA, such as a microprocessor, that can be configured and interconnected using the programmable logic features in the FPGA.
Macros for implementing multipliers in hardware are provided using a register transfer level (RTL) multiply operation, where RTL is a high-level hardware description language (HDL) for defining digital circuits. Multipliers are one of the most critical macros especially for DSP designs as their implementation on FPGA resources can significantly impact both the size and the performances of the final design. From a behavioral side, multipliers can take multiple configurations, from multipliers with one constant input to full multipliers where both operands are variable and different. The size of the operands is another of the main differentiating characteristics between multipliers.
Multipliers can be implemented on FPGAs using LUTs, MUXCYs, or other specific resources providing a multiplier primitive. An array of LUTs and MUXCYs are provided in the Virtex2 or Virtex4 FPGAs manufactured by Xilinx Corporation of San Jose, Calif. The Virtex2 and Virtex4 FPGAs include other primitives such as the Virtex4 DSP48 or the Virtex2 MULTI 18×18.
One approach available in the prior art to implement multipliers using LUTs or MUXCYs or other dedicated multiplier resources is to perform resource allocation to implement the multipliers randomly. Multipliers are first generated by creating adder trees to perform the sum of partial product terms. Dedicated resources, such as LUTs, are then used to provide the adder tree stages to create the multiplier.
It would be desirable to provide a method for optimizing resource allocation by minimizing the number of dedicated multiplier resources required for a design. In particular, what is needed is a method of resource allocation for multipliers that takes into consideration factors such as: the number of primitives required to implement the multiplier, a user choice for multiplier components, or the size of the multiplier operands.