1. Field of the Invention
The present invention relates to an architecture for a field programmable gate array (FPGA). More specifically, the present invention relates to an FPGA that includes a plurality of configurable logic blocks (CLBs), each having a configurable logic element (CLE) and an associated function block.
2. Related Art
FIG. 1A is a representation of the binary multiplication of a 4-bit multiplicand number x with a 4-bit multiplier word Y. Multiplicand number X includes bits X3, X2, X1 and X0 (X3 being the most significant bit and X0 being the least significant bit). Similarly, multiplier word Y includes bits Y3, Y2, Y1 and Y0 (Y3 being the most significant bit and Y0 being the least significant bit). Each bit of multiplicand number x is multiplied by each bit of multiplier word Y as illustrated, thereby creating four partial products 101-104. The first partial product 101 includes each bit of multiplicand number X multiplied by Y0. The second partial product 102 includes each bit of multiplier number X multiplied by Y1. The second partial product 102 is shifted left one place with respect to the first partial product 101, thereby providing the appropriate weight to these partial products. The third and fourth partial products 103 and 104 are created and weighted in a similar manner. The aligned columns of partial products 101-104 are added to create product bits P7-P0. Product bits P7-P0 represent the product P of multiplicand number x and multiplier word Y.
FIG. 1B illustrates the addition of partial products 101-104 in more detail. As illustrated in FIG. 1B, the first and second partial products 101 and 102 are initially added. The values in each column are added, thereby generating six sum signals (C4, PP2, PP1, PP0, P1 and P0) and four carry signals (C1-C4) as illustrated. The sum and carry signals for each column are generated in response to the three input signals associated with the column. The generation of sum and carry signals in response to three generic input signals A, B and C is summarized below in Table 1.
Carry signal C1 is the carry result from the addition of X1Y0, X0Y1 and xe2x80x9c0xe2x80x9d. Similarly, carry signal C2 is the carry result from the addition of X2Y0, X1Y1 and C1. Carry signal C3 is the carry result from the addition of X3Y0, X2Y1 and C2. Finally, carry signal C4 is the carry result from the addition of X3Y1, C3 and xe2x80x9c0xe2x80x9d.
Six sum signals (C4, PP2, PP1, PP0, P1 and P0) result from the addition of partial products 101 and 102. Sum signal P0 (which is also product bit P0) is equal to X0Y0. Sum signal P1 (which is also product bit P1) is the sum result of X1Y0, X0Y1 and xe2x80x9c0xe2x80x9d. Sum signal PP0 is the sum result of X2Y0, X1Y1 and C1. Sum signal PP1 is the sum result of X3Y0, X1Y1 and C2. Sum signal PP2 is the sum result of X3Y1 and C3.
Partial product 103 is added to the six sum signals that result from the addition of partial products 101 and 102. Seven sum signals (C8, PP5, PP4, PP3, P2, P1 and P0) and four carry signals (C5-C8) are generated by this addition operation as illustrated. Partial product 104 is then added to the seven sum signals resulting from the addition of partial products 101, 102 and 103. Sum signals (i.e., the product bits P7-P0) and carry signals C9-C12 are generated during this addition operation as illustrated.
FIG. 2 is a circuit diagram of a conventional 4xc3x974 bit multiplier circuit 200 for implementing the multiplication operation illustrated in FIGS. 1A and 1B. Multiplier circuit 200 includes AND gates 201-216 and adder circuits 221-236. Each of adder circuits 221-236 provides a sum signal and a carry signal in response to three input signals in the manner set forth in Table 1. Each of AND gates 201-216 has a first input terminal which is coupled to receive a selected one of the multiplicand bits X3-X0. Similarly, each of AND gates 201-216 has a second input terminal which is coupled to receive a selected one of the multiplier bits Y3-Y0. AND gates 201-216 operate to multiply the X and Y bits. The truth tables for the logical AND of two bits and the arithmetic product of two bits are identical, the result being a logic xe2x80x9c1xe2x80x9d value only if both bits have logic xe2x80x9c1xe2x80x9d values, the result being a logic xe2x80x9c0xe2x80x9d otherwise. As a result, AND gates 201-216 provide the terms of partial products 101, 102, 103 and 104. More specifically, AND gates 201-204 provide the terms of partial product 101; AND gates 205-208 provide the terms of partial product 102; AND gates 209-212 provide the terms of partial product 103; and AND gates 213-216 provide the terms of partial product 104. The terms of partial products 101-104 are provided to adder circuits 221-236 as illustrated. Note that the terms of partial products 101-104 are shifted down in successive columns, thereby providing the appropriate weighting to the partial products. Adder circuits 221-236 add partial products 101-104 in the manner illustrated in FIG. 1B, thereby creating product bits P7-P0. The carry signals C1-C12 and sum signals PP0-PP5 previously described in connection with FIG. 1B are illustrated in FIG. 2, thereby showing the manner in which multiplier circuit 200 implements the multiplication operation of FIG. 1B.
FIG. 3, which includes FIGS. 3A and 3B, is a circuit diagram illustrating the implementation of multiplier circuit 200 in an FPGA. The FPGA includes an array of configurable logic blocks (CLBS), which includes CLBs 301-316. Programmable interconnect resources extend between the CLBs, thereby allowing the illustrated connections to be made. The resources present within each CLB can be configured to implement either a pair of adder circuits or a pair of AND gates. For example, CLB 302 is configured to implement AND gates 201 and 202, and CLB 304 is configured to implement adder circuits 221 and 222. A total of eight CLBs (i.e., CLBs 301-302, 305-306, 309-310 and 313-314) are required to implement AND gates 201-216. A total of eight CLBs (i.e., CLBs 303-304, 307-308, 311-312 and 315-316) are required to implement adder circuits 221-236. As a result, at least sixteen CLBs are required to implement multiplier circuit 200. This typically represents a significant portion of the FPGA resources. As a result, it is fairly inefficient to implement multiplier circuit 200 in an FPGA.
Some FPGAs enable one or more of AND gates 201-202 to be implemented in CLB 304. However, even if both of AND gates 201-202 are implemented in CLB 304, at least eight CLBs are required to implement multiplier circuit 200. This still represents a significant portion of the FPGA resources. In general, multiplier circuits cannot be implemented efficiently in a conventional FPGA because of the relatively large number of CLBs required to form the multiplier circuit.
It would therefore be desirable to have an FPGA which is capable of efficiently performing multiplication operations.
Furthermore, within an FPGA, each CLB includes logic which is programmed to perform a particular function or functions desired by the user of the FPGA. In particular FPGAs, such as Xilinx""s XC4000 family of devices, writable RAM-based look-up tables are included in each CLB. The writable RAM-based look-up tables can be used to create a xe2x80x9cuser-RAMxe2x80x9d array. However, such user-RAM arrays are inefficient because creation of the RAM array detracts from the amount of logic available to perform other operations within the FPGA. That is, when a CLB is used to create user-RAM array, the logic capacity of the CLB is lost.
Moreover, the RAM arrays which can be conveniently created using the writable RAM-based look-up tables are relatively small (e.g., capable of storing only 16 to 32 bytes). To expand a RAM array (e.g., to more than 16 or 32 bytes), function generators of additional CLBs are required to perform a multiplexing function between the several smaller RAM arrays. As a result, the complexity of the signal routing for the RAM array increases, the amount of logic required by the RAM array increases, and the speed of the RAM array decreases.
For example, when implementing a 256-byte RAM, the CLB area consumed is roughly equivalent to the area of a conventional FPGA. While a 256-byte RAM may seem like a large memory to implement using a FPGA, such a RAM is still relatively small.
Moreover, the layout area required to make each RAM-based look-up table writable is not an insignificant percentage of the layout area of each CLB. This area penalty is incurred by each CLB, irrespective of whether it is used to create a user-RAM array. The total area penalty for a FPGA depends on the size of the FPGA and can be equal to the area of 100 or more CLBs.
Accordingly, it would be desirable to have a FPGA which implements a user-RAM array and overcomes the problems previously discussed.
Accordingly, the present invention provides a programmable logic device (PLD), such as an FPGA, which includes an array of configurable logic elements (CLEs) and a corresponding array of dedicated function blocks. In one embodiment, the dedicated function blocks are multiplier tiles, thereby enabling the programmable logic device to efficiently implement a multiplier function. In another embodiment, the dedicated function blocks are memory tiles, thereby enabling the programmable logic device to efficiently provide a memory function.
When the dedicated function blocks are multiplier tiles, each of the CLEs includes circuitry that can be programmed to implement various logic functions. Each of the multiplier tiles includes a dedicated multiplier array having a predetermined size (e.g., a 2xc3x974 bit multiplier array). Programmable interconnect resources can be programmed to selectively connect the CLEs to the multiplier tiles. The programmable interconnect resources can also be programmed to selectively connect the CLEs to one another, and to selectively connect the multiplier tiles to one another. In one embodiment, the programmable interconnect resources include a plurality of multiplexers.
As a result, the CLEs can be operated as conventional configurable logic elements, completely disconnected from the array of multiplier tiles by the programmable interconnect resources.
Alternatively, selected multiplier tiles can be connected to one another by the programmable interconnect resources, thereby creating a relatively high density multiplier circuit. Selected CLEs are coupled to these multiplier tiles, thereby providing an interface to this multiplier circuit. In general, the desired multiplier and multiplicand bits are routed from the CLEs to the multiplier tiles. The resulting product bits are routed from the multiplier tiles to associated CLEs. In this manner, the FPGA is capable of implementing a relatively large multiplier circuit in a resource efficient manner.
Various multiplier architectures can be used to form the multiplier tiles. These multiplier architectures can be capable of performing signed and/or unsigned multiplication operations, as well as signed/unsigned multiplication operations. Similarly, carry-propagate or carry-save multiplier architectures can be used. When a carry-save multiplier architecture is used, selected multiplier tiles will provide carry and sum signals that must be added to create the final product bits. In this embodiment, these carry and sum signals are routed into the CLE associated with the multiplier tile, and then added within this CLE.
In accordance with another embodiment, the dedicated function blocks are memory tiles. The dedicated memory tiles have a relatively high density when compared with the density of writable RAM-based look-up tables typically present in CLEs. The PLD also includes an array of CLEs, wherein each of the CLEs in the array is coupled to a corresponding one of the memory tiles. The composable RAM array is accessed through the CLEs. That is, the input signals required by the memory tiles are routed through the corresponding CLEs. Similarly, the output signals provided by the memory tiles are routed out through the corresponding CLEs.
Each CLE can be configured to operate as a conventional CLE (i.e., ignore its corresponding memory tile). Alternatively, each CLE can be configured to provide an interface to its corresponding memory tile. To help achieve this, each CLE comprises a set of multiplexers for selectively routing data output signals provided by the corresponding memory tile or output signals provided by the CLE.
In addition, each memory tile is capable of being selectively coupled to one or more adjacent memory tiles, thereby allowing the size of the composable RAM array to be selected by the circuit designer. This capability also allows the composable RAM array to be configured to form a plurality of separate and independent memories.
The present invention will be more fully understood in view of the following description and drawings.