Application-specific integrated circuits (ASICs) are designed to perform a specific function, as opposed to a microprocessor which can be programmed to perform a variety of functions. The major advantages of ASICs are typically lower unit cost and higher performance. ASICs are normally fabricated in some form of complementary metal-oxide semiconductor (CMOS) technology using custom, standard cell, physical placement of logic (PPL), gate array, or field programable gate array (FGPA) design.
Gate arrays and FPGAs are semi-custom devices which contain a fixed set of gate structures which may be interconnected in a number of ways to achieve a desired logic function. In gate arrays the interconnect pattern is defined by the manufacturer using customized process masks. In FPGAs the interconnect pattern is programmed electrically by the user.
FPGAs generally include an array of programmable function units (PFUs). A PFU may also be called a configurable logic block (CLB) or a configurable logic element (CLE). Each PFU is a small programmable logic block which often includes one or more input lines, one or more output lines, one or more latches, and one or more look-up table (LUTs). There are usually a greater number of input lines than output lines, with each input line being either a dedicated data line or a dedicated control line. The LUT can be programmed to perform various functions including general combinatorial or control logic, read only memory (ROM), random access memory (RAM), or data path functions between the input and output lines. In this manner, the LUT determines whether the respective PFU performs general logic, or a special mode such as an adder, a subtracter, a counter, an accumulator, a register, or a memory cell such as single-port ROM or a single-port RAM. In some instances, the LUT can be used relatively independently of the latches. FPGAs typically contain on the order of 100-1000 essentially identical PFUs.
FPGAs also include a programmable interconnection network that surrounds the PFUs. The interconnection network includes programmable crosspoint switches and metal interconnect segments (routing nodes) for selectively coupling various PFUs. The crosspoint switches are also called programmable interconnect points (PIPs). The crosspoint switches provide signal switching, amplification, and isolation. The metal interconnect segments may be arranged symmetrically about the FPGA's horizontal and vertical axis.
The function of the FPGA is determined by the combined programming of the PFUs and the interconnection network. The user selects the FPGA function by loading a configuration bit stream into the FGPA at power-up or under system control to accomplish this combined programming. Various bits of the configuration bit stream are stored in the FPGA's internal configuration RAM. The configuration RAM is coupled to the LUTs and to the crosspoint switches. Therefore, the configuration bit stream determines the specific function for each PFU as well as the interconnections between the input and output lines of various PFUs, external bonding pads, and other circuitry in the FPGA. The configuration bit stream may initially reside in an electrically erasable programmable ROM (EEPROM), a ROM on a circuit board, or any other storage medium external to the FPGA.
FPGAs may also be defined in terms of programmable logic cells (PLCs) and programmable input-output cells (PICs). The PLCs contain the PFUs, various configuration RAM, and portions of the interconnect network that couple to the PFUs. Thus, various logic functions are performed in the PLCs. The PICs are located at the perimeter of the device, outside the PLCs. The PICs contain input-output buffers, various configuration RAM, and portions of the interconnect network that couple to the bonding pads. Each PIC, for instance, may contain four buffers for interfacing with four bonding pads. Each buffer may be configured as an input, an output, or a bi-directional input- output. Each buffer may also be configured as TTL or CMOS compatible.
FPGAs are further described in U.S. Pat. Nos. 5,386,156; 5,384,497; 4,870,302; U.S. Pat. No. reissue 34,363; and European Patent Specification Publication No. 0 177 261 B 1; which are all incorporated herein by reference.
Binary multiplication is one of many logic functions that can be implemented in an FPGA. In the binary system, a multiplicand is multiplied by each bit of a multiplier to form a product. If a multiplier bit is "1", the multiplicand is entered in an appropriately shifted position. If the multiplier bit is "0", then "0"s are entered. The appropriately shifted multiplicands are added to form the product.
Parallel multipliers (also called array multipliers) are used for performing binary multiplication. Referring to FIGS. 1 and 2, a conventional parallel multipier 2 typically contains a two-dimensional logic array of cells 4. Multiplying an M-bit multiplicand by an N-bit multiplier is accomplished by M.times.N cells arranged in N rows of M cells. The rows are shifted left by one cell with respect to the row immediately above, and the multiplicand is shifted left one cell per row by a diagonal signal path. The "basic cell" of a parallel multiplier, as used herein, includes an AND gate 6 coupled to a full-adder 8. The AND gate receives one bit of the multiplicand and one bit of the multiplier, and generates a product of these bits. The output of the AND gate is coupled to an input of the full-adder. The full-adder adds the bit product from the AND gate to a carry-in bit and to an incoming partial-product bit to produce a sum bit and a carry-out bit. Therefore, the AND gate determines whether or not a multiplicand bit is added to the incoming partial-product bit, based on the value of the multiplier bit for that row. If the multiplier bit for the row is "1", the array adds the multiplicand (appropriately shifted) to the incoming partial-product to generate the outgoing partial-product. If the multiplier bit for the row is "0", the incoming partial-product is passed vertically downward unchanged. Parallel multipliers utilizing the basic cell are well known in the art; see, for instance, Hamacher et al., Computer Organization, published by McGraw-Hill, 1978, pp. 194-195, which is incorporated herein by reference.
Many schemes exist for parallel multipliers. Every cell of a parallel multiplier need not necessarily be the basic cell. For instance, the right-most cells of each row may replace the full-adder with a half-adder with no carry-in bit received. Or, the upper row cells may replace the full-adder with a half-adder with no partial-product bit received. Nevertheless, usually at least one, if not a substantial number of cells, is the basic cell. Furthermore, a parallel multiplier may use the basic cell for each cell, setting the carry-in bit of the right-most cells to "0", and setting the partial-product bit of the upper row cells to "0".
A primary shortcoming and deficiency with conventional FPGAs is that individual PFUs are normally incapable of providing the basic cell of a parallel multiplier. As a result, two PFUs are normally required to implement the basic cell, with one PFU functioning as the AND gate and the other PFU functioning as the full-adder. The need for additional PFUs creates several disadvantages. First, the demand for hardware and chip area is increased. Secondly, additional time delays arise during operation. For example, signals between the PFUs are often delayed by source-to-gate and drain-to-gate capacitances in the n-channel FET crosspoint switches of the interconnection network. Accordingly, there is a need for an FPGA which efficiently implements a parallel multiplier.