This invention relates to programmable integrated circuit devices. More specifically, the present invention relates to field programmable gate arrays (FPGAs).
An FPGA is a type of programmable logic device (PLD) that can be configured to perform various logic functions. An FPGA includes an array of configurable logic blocks (CLBs) connectable via programmable interconnect structures. For example, a first FPGA, invented by Freeman, is described in U.S. Pat. RE34,363. CLBs and interconnect structures in FPGAs are shown in U.S. Pat. No. 5,889,411 issued to Chaudhary et al. and pages 4–32 through 4–37 of the Xilinx 1996 Data Book entitled “The Programmable Logic Data Book” available from Xilinx, Inc., 2100 Logic Drive, San Jose, Calif. 95124. The Freeman reference, the Chaudhary reference, and the Data Book are incorporated herein by reference.
In addition to the structures discussed above, FPGAs also include structures for performing special functions. In particular, FPGAs include carry circuits and lines for connecting the carry output of one bit generated in one CLB to the carry input of another CLB, and cascade lines for allowing wide functions to be generated by combining several adjacent CLBs. Carry structures are discussed by Hsieh et al. in U.S. Pat. No. 5,267,187 and by New in U.S. Pat. No. 5,349,250.
Cascade structures are discussed by Goetting et al in U.S. Pat. No. 5,365,125 and by Chiang et al. in U.S. Pat. No. 5,357,153. These patents are also incorporated herein by reference.
As discussed by the above-incorporated references, each CLB may include one or more slices (“slice” or “CLB slice”). Each slice, in turn, includes at least one configurable function generator. The configurable function generator is typically implemented as a four-input lookup table (LUT). The incorporated references also point out that the carry circuits and cascade structures increase the speed at which the FPGA can perform certain functions, such as arithmetic functions.
FIG. 1A is a simplified block diagram of a conventional CLB 100. The illustrated CLB 100 includes a first slice 110 and a second slice 120. First slice 110 includes a first function generator G 112, a second function generator F 114, a third function generator 116, and an output control block 118. Output control block 118 may include multiplexers, flip-flops, or both. Four independent input terminals are provided to each of the G and F function generators 112 and 114. A single input terminal C1-in is provided to third function generator C1 116. Each of function generators 112 and 114 is typically implemented as a four-input LUT, and is capable of implementing any arbitrarily defined Boolean function of the inputs signals. Each of the input terminals may be assigned a number or a letter and referred to as a “literal.” For example, in CLB 100, function generator 112 receives four input signals, or literals, G1, G2, G3, and G4. Function generator 116, typically implemented as a set of configurable multiplexers, is often used to handle carry bits, but can implement some Boolean functions of its three input signals C1-in, G′, and F′. These Boolean functions include bypass, inverter, 2-input AND (product), and 2-input OR (sum). Signals G′, F′, and C1-out are multiplexed through output control block 118. Multiplexer 118 provides output signal lines Y, QY, X, and QX. For this reason, output control block 118 may also be referred to as the “output multiplexer” or “output select multiplexer.” Slice 110 may also provide the carry out signal, C1-out. Second slice 120 is similar to first slice 110. Accordingly, operations of second slice 120 are similar to the operations of first slice 110.
Operation of CLB 100 is also described by the incorporated references, and, in particular, in chapters seven and eight of the above-incorporated Data Book. For simplicity, CLB 100 of FIG. 1 is illustrated with two slices; however, the number of slices constituting a CLB is not limited to two.
FIG. 1B is a simplified block diagram of another conventional CLB 100a. CLB 100a is similar to CLB 100 of FIG. 1A but has an additional LUT 113. LUT 113 takes outputs of LUT 112 and 114 as well as another input K1 to slice 110a. Thus, LUT 113 allows slice 110a to implement any arbitrarily defined Boolean function of nine literals G1, G2, G3, G4, F1, F2, F3, F4, and K1. CLB 110a may include additional slices represented by ellipses 120a. 
Technology mapping for LUT-based FPGAs involves decomposition of a circuit into combinational logic having nodes with 4-input (“fan-in”) functions that can be realized in the LUTs of CLB slices. This is because, as shown in slice 110, the slices commonly include 4-input LUTs as their function generators. By conventionally specifying the functions of function generators F, G, and C1, and output control block 118, slice 110 can be programmed to implement various functions including, without limitation, two independent functions of up to four variables each.
Circuit designs are mapped to FPGAs as combinational logic. The combinational logic may be expressed in Boolean expressions including a number of logic levels and routing between the logic levels. The Boolean expressions include product (logical AND) and sum (logical OR) operations. Two levels of combinational logic may be expressed using sum-of-products (SOP) format. In fact, given a set of inputs and their inverse, any logic equation can be expressed using the SOP format.
In the FPGA art, there is a continuing challenge to increase speed (performance) of FPGA-implemented functions, or circuits. Circuit performance, or speed, is increased when circuit delay is decreased. Circuit delay includes two main components: logic delay and routing delay.
Using logical axioms and Boolean algebraic rules, it is possible to partially collapse a circuit design to reduce the number of logic levels, thus reducing the routing delay. However, this creates wide fan-in nodes. The wide fan-in nodes require use of several levels of LUTs for implementation. This is because, as described above, the LUTs have limited fan-in, for example fan-in of four. Therefore, to implement wide fan-in nodes, multiple levels of CLBs must be used. The requirement to use multiple levels of CLBs increases the logic delay as well as creating other routing delays. These negative effects cancel out the benefits from the routing delay reduction provided by the partial collapse of the circuit design.
Accordingly, there is a need for a method to implement wide fan-in nodes in FPGAs while avoiding the negative effects described above. Additionally, there is a need for CLB and CLB slice designs that allow for fast implementation of wide fan-in SOP functions.