This invention relates to programmable logic devices having a repeating pattern of logic blocks, and more particularly to an improved logic block therefor.
Field programmable gate arrays (FPGAs) are well known in the art. An FPGA comprises an array of configurable logic blocks (CLBs) which are interconnected to each other through a programmable interconnect structure to provide a logic function desired by a user.
U.S. Pat. No. 4,870,302, reissued as U.S. Pat. No. RE 34,363, and incorporated herein by reference, describes a well known FPGA architecture. Other publications, such as El Gamal""s U.S. Pat. No. 4,758,745, Kean""s U.S. Pat. No. 5,243,238, and Camarota and Furtek""s U.S. Pat. No. 5,245,227, also incorporated herein by reference, describe other FPGA architectures. Pages 4-5 through 4-45 of the Xilinx 1996 Data Book entitled xe2x80x9cThe Programmable Logic Data Bookxe2x80x9d, available from Xilinx, Inc., 2100 Logic Drive, San Jose, Calif. 95124, also incorporated herein by reference, describe several products which implement a number of FPGA architectures.
An FPGA is a general purpose device, i.e., it is capable of performing any one of a plurality of functions, and is programmed by an end user to perform a selected function. Because of this design flexibility, a general purpose FPGA includes a significant number of wiring lines and transistors, many of which remain unused in any particular application. FPGAs include overhead circuits which facilitate programming of the FPGA to do the specified function. To the extent possible without interfering with required functions, there is a need to conserve overhead chip area by using logic components efficiently. There is a need to minimize both the number of routing lines in a device and the number of logic blocks that must be used to perform a given logic function.
In U.S. Pat. No. 5,682,107 of Tavana, Yee and Holen, a CLB is disclosed wherein four lookup table function generators each have four input lines and an output line connected as the control input to a carry chain multiplexer, at least one carry chain multiplexer being associated with each function generator. Each carry chain multiplexer receives a first input from the carry chain (i.e., the output of the prior multiplexer in the serial array of carry chain multiplexers) and a second input from an additional distinct input line to the CLB. The function and structure of a carry chain are described at length in commonly assigned U.S. Pat. No. 5,349,250 to New.
The following drawing conventions are used throughout the figures. A small solid black dot at the intersections of two lines indicates a permanent electrical connection between the crossing lines. An open circle enclosing an intersection between two lines indicates a programmable connection between the lines (for example, a pass transistor, which is turned on to make the connection). Open circles represent bidirectional signal flow between the two lines. An open triangle at an intersection of two lines indicates a programmable connection with signal flow going onto the line pointed to by the apex of the triangle. (The signal is of course then present on the full length of the line. Thus, a triangle pointing in the opposite direction would have the same signal flow because the triangle points to the same wire.) Programmable connections are provided at programmable interconnection points (PIPs), wherein each PIP includes at least one transistor.
A triangle that is on a line but not at an intersection indicates a buffer that produces signal flow in the direction indicated by the apex of the triangle. In FIG. 3, except for global lines CLK, CE, RST, TS, ENOUT, and ENLL a line which ends within the tile or matrix structure (i.e., does not extend to the border of the tile or matrix) is physically terminated within the tile. A line which extends to the border of the tile or matrix connects to a line on the next tile, which it contacts when two tiles are abutted together. Note that some lines which extend to an edge of a tile and thus into an adjacent tile change names at the tile boundary.
FIG. 1 shows an FPGA chip 100 in which the CLB of the invention may be employed. In the center portion of chip 100 are a plurality of core tiles 101, which are interconnected by conductive lines (described in detail below). Chip 100 includes pads, i.e., pads P1-P56, and input/output blocks (IOBs) for connecting edge tiles 103, 104, 105, 106, and corner tiles 113-116 to external pins of a package that holds chip 100. Each edge tile and corner tile is further connected to a core tile 101. Power voltage source pads VCC and ground source pads GND have connections (not shown) in a conventional manner throughout chip 100.
FIG. 2 shows a core tile 101. Core tile 101 includes a programmable routing matrix 201 and a CLB matrix 202. Programmable routing matrix 201 is described in detail by Tavana et al. in U.S. Pat. No. 5,682,107. CLB matrix 202 is described in reference to FIG. 3 and also in detail in the related Tavana et al. patent application.
In FIG. 2, CLB matrix 202 is connected to another CLB matrix in a tile to the west (not shown) by output lines Q0-Q3 and input lines QW0-QW3. CLB matrix 202 connects to a CLB matrix in the tile to the north (not shown) by output lines Q0-Q3 and input lines QN0-QN3, to a CLB matrix in the east by output lines Q0-Q3 and input lines QE0-QE3, and to a CLB matrix in the south tile (not shown) by output lines Q0-Q3 and input lines QS0-QS3. Note that carry-in line CIN and carry-out line COUT, which extend vertically in tile 101, connect to carry-out and carry-in lines, respectively, in adjacent tiles north and south. Certain labels shown but not discussed in FIG. 2 are discussed by Tavana et al. in related U.S. Pat. No. 5,682,107 and are shown here for the convenience of the reader.
The carry-in and carry-out lines form a fast carry path for arithmetic functions, as discussed in detail by Bernard J. New in U.S. Pat. No. 5,349,250, entitled xe2x80x9cLOGIC STRUCTURE AND CIRCUIT FOR FAST CARRYxe2x80x9d, which is incorporated herein by reference. Programmable routing matrix 201 is connected in the four directions shown, and additionally connects to CLB matrix 202. Programmable routing matrix includes a programmable interconnect structure for interconnecting the five sets of incoming lines to each other.
CLB Matrix 202
FIG. 3 illustrates CLB matrix 202 of FIG. 2. CLB matrix 202 includes a CLB 301, a tristate buffer block 302, an input interconnect structure 303, a CLB output interconnect structure 304, a feedback interconnect structure 305, a general input interconnect structure 306, a register control interconnect structure 307, an output interconnect structure 308, and output enable blocks 309. The structure of FIG. 3 is described in detail by Tavana et al. in related patent U.S. Pat. No. 5,682,107.
Configurable Logic Block 301
A prior art CLB 301 is illustrated in FIG. 4. CLB 301 includes four function generators F, G, H, and J. Each function generator comprises a 16-bit lookup table that generates an output signal determined by the four input signals provided to the function generator and the 16 values stored in the lookup table. Thus, function generator F generates an output signal determined by the input signals provided on lines F0-F3, function generator G generates an output signal determined by the signals provided on CLB input lines G0-G3, and so on for H and J. This CLB is discussed in detail by Tavana et al. in application Ser. No. 08/618,445, incorporated by reference.
Function generators F, G, H, and J provide output signals on CLB output lines X, Y, Z, and V, respectively. The FIG. 4 CLB includes a carry chain for fast implementation of arithmetic functions. The output signals from function generators F, G, H, and J control multiplexers C1, C2, C3, and C4, thereby providing a cumulative carry-out function COUT. Multiplexer C1 receives a carry-in signal on line CIN and an input signal on line FB, and generates an output signal on line CF. Multiplexer C2 receives the signal on line CF and an input signal on line GB, and generates an output signal on line CG. Multiplexers C3 and C4 are connected in the same manner as multiplexers C1 and C2. Multiplexer C4 provides an output signal on line COUT from CLB 301. For a detailed discussion of the implementation of arithmetic functions, see commonly assigned U.S. Pat. No. 5,349,250 invented by Bernard E. New, entitled xe2x80x9cLOGIC STRUCTURE AND CIRCUIT FOR FAST CARRYxe2x80x9d, which is incorporated herein by reference.
In addition to function generators F, G, H, and J, each CLB 301 includes four storage devices RX, RY, RZ, and RV. These storage devices RX, RY, RZ, and RV each comprise flip flops with master and slave stages and an output multiplexer which takes outputs from the master and slave stages as inputs. Thus, storage devices RX, RY, RZ, and RV can be configured by the multiplexer to serve as either flip flops or as latches. The outputs of storage devices RX through RV appear on output signal lines XQ through VQ, respectively.
Typically, periodic repowering of the carry signal is necessary. To provide this repowering, a repowering buffer comprising inverters I121 and I122 is provided.
In FIG. 4, CLB 301 includes five input lines per function generator. For example, referring to function generator F, CLB input lines F0-F3 provide four input signals to function generator F, and a fifth CLB input line FB provides a multiplexer control input signal. Function generators G, H, and J are structured in a similar manner. Three input lines CLK, CE, and RST provide clock, clock enable, and reset signals, respectively, to registers RX, RY, RZ, and RV.
In the embodiment of FIG. 4, multiplexers D1-D4 selectively provide either the output signals from function generators F, G, H, and J (the signals on CLB output lines X through V) or the output signals from multiplexers B1-B4 (the signals on CLB output lines XB through VB) to registers RX through RV, respectively. If multiplexers S1 and S3 are set to forward the carry signals CF and CH of multiplexers C1 and C3, respectively, then multiplexers B1-B4 select between the input signals on CLB input lines FB through JB, respectively, and the output signals of multiplexers C1-C4. Multiplexers FG and HJ allow functions of five input signals to be generated by loading a 32-bit truth table into two 16-bit function generators, duplicating four input signals to the two function generators and applying the fifth input signal to line FB or HB. Multiplexer PG provides a local source of power or ground voltage on line K.
Multiplexers C1-C4, in addition to being used for the carry function in an arithmetic operation, also generate wide AND and OR functions. To generate the AND function, a logic 0 is placed on line FB. This constant logic 0 input causes multiplexer C1 to generate an AND function of the F function generator output signal on CLB output line X and the carry-in signal on line CIN. Alternatively, to generate the OR function, a logic 1 is placed on CLB input line FB and a complementary truth table is loaded into the F function generator. The constant logic 1 causes multiplexer C1 to generate an OR function of the complement of the output signal on CLB output line X and the carry-in signal on line CIN. The function of multiplexers C1-C4 and their interaction with the logic block are further discussed by New in U.S. Pat. No. 5,349,250 incorporated herein by reference.
Also known in the prior art from U.S. Pat. No. 5,267,187 by inventors Hung-Cheng Hsieh et al. entitled xe2x80x9cLOGIC STRUCTURE AND CIRCUIT FOR FAST CARRYxe2x80x9d are structures such as shown in FIG. 5. From two input signals Ai and Bi, a function can be generated in function generator 902. This can include the sum function Si when function generator 902 is so configured. Dedicated hardware 901 included in the XC4000 products generates a propagate function Pi for controlling carry multiplexer 913. The same input signal Ai to function generator 902 also is an input signal to carry multiplexer 913.
The structure of U.S. Pat. No. 5,349,250 is shown in FIG. 6. The input signal Ai to function generator 903 is also an input to carry multiplexer 923. The sum Si is generated either in another function generator as was done in FIG. 5, or by a dedicated XOR gate 926.
Users of FPGAs frequently want to perform arithmetic functions including addition and multiplication. Addition (or subtraction) is easily performed in the architectures of either FIG. 5 or FIG. 6. When two numbers xe2x80x9caxe2x80x9d and xe2x80x9cbxe2x80x9d, each being multi-bit numbers, are to be added, bits of successively higher significance are applied to input terminals of successive function generators connected in the carry chain. For example if the bits ai and bi are applied to the structure of FIG. 5, then the next more significant bits ai+1 and bi+1 are applied to a structure (not shown) that is located directly above FIG. 5. Thus addition of two n-bit numbers can be performed in a structure using n copies of FIG. 5. For addition or subtraction, the structure of FIG. 6 also requires n copies to add two n-bit numbers.
However, the structures of FIGS. 5 and 6 are not efficient for multiplication. Multiplication is performed is follows. Table I shows a 4-bit unsigned multiplication example. Terms of the multiplication are shown at the left and an example multiplication is shown at the right. One can see that when the value of the b-bit is 1, the value of the full number xe2x80x9caxe2x80x9d is shifted and added, whereas when the value of the b-bit is 0, the value of the number xe2x80x9caxe2x80x9d is bypassed.
One line in Table I can be produced by the circuit of FIG. 7. To produce the one line, the entire number xe2x80x9caxe2x80x9d is ANDed with one bit bn of the number xe2x80x9cbxe2x80x9d. To produce bits of the final result xe2x80x9crxe2x80x9d, all lines in the above table must be added. FIG. 8 shows a tree structure for adding the first two lines of TABLE I, adding the last two lines of TABLE I, and then adding the two sums. Intermediate results r0xe2x80x2 through r5xe2x80x2 are generated from adding the first two lines of the sum. Intermediate results r2xe2x80x3 through r7xe2x80x3 are generated from adding the last two lines of the sum. The right hand side of FIG. 8 shows the final addition being performed in the binary addition tree structure. (Some logic optimization has been performed whereby the sum of the most significant bits of the last two lines is folded in with the final addition.) A binary addition tree structure minimizes the delay between the time the input bits a0 through a3 and b0 through b3 are applied to the input terminals and the final result r0 through r7 appears at the output terminals. It is usually desirable to perform the operations of FIG. 8 with as little delay as possible. Four-bit numbers require two levels of addition. Eight-bit numbers require three levels. Thus a binary addition tree structure minimizes the number of levels and therefore minimizes delay.
However, it is also possible to perform multiplication using a chain addition structure for multiplication. FIG. 9 shows such a chain addition structure. The chain structure produces more delay than the tree structure but uses less area when implemented in four-input function generators because the chain structure can conveniently be divided into units having four inputs. For example, the structure labeled FGEN can be implemented in one function generator. In applications in which delay is not important but minimizing area is important, the chain structure may be chosen.
FIG. 8a illustrates a portion of FIG. 8. The two AND gates 11 and 12 and carry chain adder 13 are shown in FIG. 8a in the same orientation as they are shown in FIG. 8. In order to perform the operation illustrated in FIG. 8a using the architecture of FIG. 5, three units of FIG. 5 are required. FIG. 10 shows this implementation. Two units are taken up simply implementing AND gates 11 and 12, and the third unit implements carry chain adder 13. Portions of the units not used are drawn with faint lines and portions used are drawn with heavy lines. Clearly, much of the available circuitry is not used; this architecture does not efficiently implement the multiplication operation.
In order to perform the same operation using the architecture of FIG. 6, two units of FIG. 6 are required, as shown in FIG. 11. AND gate 11 is formed in function generator 903-1 to combine two input signals am and bn+1. AND gate 12 is formed in function generator 903-2 to combine the other two input signals am+1 and bn. Function generator 903-2 also forms part of the carry chain adder 13, generating propagate signal Pi from the output of AND gate 12 and the output of AND gate 11, which is in function generator 903-1. Thus in the structure of FIG. 6, two such units are required to implement the logic shown in FIG. 8a. Thus the structure of FIG. 6 still wastes silicon area when performing multiplication.
Additionally, in the implementation shown in FIG. 11, the path through signal ambn+1 has more delay than the path through signal am+1bn, because there are two function generators on the ambn+1 signal path. In a pipelined system (where both of these signals would have to be registered), AND gate 12 would have to be brought back out into a third function generator, and three units of FIG. 6 would be consumed.
When implementing multiplication in FPGAs, it is desirable to further reduce the silicon area required to implement such commonly used logic as well as to reduce delay in calculating the output signals.
A principal object of the present invention is to tailor the silicon area more closely to the desires of designers who will use the FPGA in which the invention is placed.
Another object is to minimize silicon area and thereby minimize cost by using portions of the CLB for more than one purpose.
Another object of the invention is to combine a flexible multi-purpose logic block with a small dedicated structure for generating AND and OR functions.
According to the invention, one of the carry chain input signals is derived from two of the function generator input signals. In a first embodiment, one carry chain input signal comes from a dedicated AND gate receiving two of the function generator input signals. For a multiplication operation using either a binary-addition-tree algorithm or a chain addition algorithm, the AND gate provides a low-cost low-latency multiplication feature. For a given number of bits to be multiplied, the structure including the AND gate requires fewer CLBs than the prior art structures, as well as less FPGA interconnect routing. Additionally, the structure offers low loading for all signals, thus high speed.
In a second embodiment, the AND gate and a four-input multiplexer are combined. The four-input multiplexer receives one input signal from the AND gate, one from one of the function generator input signals, one from a logic high source and one from a logic low source. This multiplexer facilitates the starting of a carry chain and the formation of wide AND gates and OR gates.
In one embodiment, the AND gate is provided at no cost in silicon area (with a possible small cost in metal routing) because AND gates exist as part of the decoding structure of the lookup table multiplexer, and the output signal from one of the AND gates is simply provided as input to both the carry chain multiplexer and the lookup table multiplexer.
An additional benefit of the invention is that wide AND, OR, NAND, and NOR functions can be generated using dedicated input lines (two function generator input lines) and dedicated output lines (the carry chain output lines), in addition to the multiplication and other functions facilitated by the invention.