This invention relates to implementing division in programmable integrated circuit devices such as, e.g., programmable logic devices (PLDs).
As applications for which PLDs are used increase in complexity, it has become more common to design PLDs to include specialized processing blocks in addition to blocks of generic programmable logic resources. Such specialized processing blocks may include a concentration of circuitry on a PLD that has been partly or fully hardwired to perform one or more specific tasks, such as a logical or a mathematical operation. A specialized processing block may also contain one or more specialized structures, such as an array of configurable memory elements. Examples of structures that are commonly implemented in such specialized processing blocks include: multipliers, arithmetic logic units (ALUs), barrel-shifters, various memory elements (such as FIFO/LIFO/SIPO/RAM/ROM/CAM blocks and register files), AND/NAND/OR/NOR arrays, etc., or combinations thereof.
One particularly useful type of specialized processing block that has been provided on PLDs is a digital signal processing (DSP) block, which may be used to process, e.g., audio signals. Such blocks are frequently also referred to as multiply-accumulate (“MAC”) blocks, because they include structures to perform multiplication operations, and sums and/or accumulations of multiplication operations.
For example, PLDs sold by Altera Corporation, of San Jose, Calif., as part of the STRATIX® family, include DSP blocks, each of which may include four 18-by-18 multipliers. Each of those DSP blocks also may include adders and registers, as well as programmable connectors (e.g., multiplexers) that allow the various components to be configured in different ways. In each such block, the multipliers can be configured not only as four individual 18-by-18 multipliers, but also as four smaller multipliers, or as one larger (36-by-36) multiplier. In addition, one 18-by-18 complex multiplication (which decomposes into two 18-by-18 multiplication operations for each of the real and imaginary parts) can be performed.
Larger multiplications can be performed by using more of the 18-by-18 multipliers—e.g., from other DSP blocks. For example, a 54-by-54 multiplier can be decomposed, by linear decomposition, into a 36-by-36 multiplier (which uses the four 18-by-18 multipliers of one DSP block), two 36-by-18 multipliers (each of which uses two 18-by-18 multipliers, for a total of four additional 18-by-18 multipliers, consuming another DSP block), and one 18-by-18 multiplier, consuming a portion of a third DSP block. Thus, using 18-by-18 multipliers, nine multipliers are required to perform a 54-by-54 multiplication.
One type of mathematical function that heretofore has not been easily implemented in a PLD or other programmable device is division. Division, especially double-precision floating point division, which may be required for High Performance Computing, is expensive and slow on current FPGAs. A common implementation in general-purpose programmable logic of an FPGA uses a network of 64 80-bit adders, typically requiring between 6,000 and 9,000 four-input look-up tables. Moreover, the resulting operation is slow, typically having a 150 MHz system speed and about 57 clock cycles of latency.