This application is a National Stage application under 35 U.S.C. xc2xa7371 of International Application No. PCT/GB/03902, which both designated and elected the United States, and claims priority to Great Britain Patent Application No. 9727414.6, filed Dec. 29, 1997.
This invention relates to logic circuits, such as so-called field programmable gate arrays (FPGAs) or so-called complex programmable logic devices (CLDs) and to functional logic blocks within such circuits.
FPGAs are integrated circuits containing a very large number of logic gates organised as an array, with the interconnections between the logic gates and the function of the gates themselves being controllable or configurable via externally programmed control circuitry. In this way, an FPGA can be adapted, after fabrication, to perform a number of different tasks simply by changing the settings provided by the externally programmed control circuitry.
FPGAs are particularly useful because of their relatively low cost and short design times in comparison with bespoke integrated circuits.
It has been suggested that FPGAs are well suited for use as reconfigurable hardware to accelerate software in many applications [see publication reference 1 below]. Image or video processing tasks are particularly well suited to hardware acceleration, because of the inherent parallelism and data flow structure. Common to many image and video processing tasks is the need for intensive arithmetic operations such as multiplication and addition.
Whilst existing FPGA architectures are well suited to binary addition [2], configuring FPGAs for binary multiplication results in the available reconfigurable resources being used inefficiently [3]. Typically over 70% of the FPGA could be required solely for multiplication in some applications. The literature also suggests that hardware implemented on an FPGA requires as much as 100 times more die area, and will be about 10 times slower than the bespoke hardware equivalent [4].
Hwang has suggested constructing Universal Multiplication Networks using small (4 bit) programmable Additive Multiply (PAM) modules [5]. Whilst simple in design, these networks have the drawback of slow multiplication times and a non-scalable connection pattern, especially for large operand sizes.
Most other multiplier architectures are concerned with the multiplication of two multiplicands of fixed length. The techniques used for speeding us the multiplication are largely at the expense of regularity, such as Wallace Trees [6], or require some form of xe2x80x9cpre-processingxe2x80x9d of the operands, e.g. Booth""s Modified algorithm [7]. Neither of these styles of design is well suited to generalisation for multiplicands of variable size, thus making it difficult to create a reconfigurable multiplier based on these methods.
So, there is a need for a field programmable or configurable logic circuit or circuit element (e.g. as a part of a larger FPGA) which can be used efficiently to carry out multiplication operations.
This invention provides a logic block comprising:
an mxc3x97n array of partial calculating circuits (where mxe2x89xa72 and nxe2x89xa72) operable to generate partial product components of an m-bit multiplicand x n-bit multiplicand binary multiplication and to generate a cumulative sum of the partial products for each bit of one of the multiplicands; and
a configurable output circuit operable, under the control of a configuration signal, either:
(a) to sum the cumulative sums of partial products generated by the partial calculating circuits so as to generate a product value; or
(b) to pass data representing the cumulative sums of the partial product components to partial calculating circuits within one or more further logic blocks.
Embodiments of the invention can provide FPGAs combined with reconfigurable multipliers. A design is presented for a reconfigurable multiplier array, constructed using an array of 4 bit flexible array blocks (FABs) which has speed comparable to that of a conventional signed array multiplier, with minimal extra cost in hardware required for reconfiguration. The multiplier can be configured to perform both unsigned, and signed two""s complement multiplication.
The logic block can be used to generate simply an m-bit by n-bit product, or in alternative preferred embodiments plural logic blocks can be linked together to form a composite multiplier capable of handling larger multiplicands.
Because the logic blocks are relatively self-contained and so require very little configuration data, there can be a dramatic reduction in the administrative overhead usually needed to configure a gate array to perform large multiplications.
Other respective aspects and features of the invention are defined in the appended claims.