Programmable logic devices (PLDs) are a well-known type of integrated circuit that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (FPGA), typically includes an array of programmable tiles. These programmable tiles can include, for example, input/output blocks (IOBs), configurable logic blocks (CLBs), dedicated random access memory blocks (BRAM), multipliers, digital signal processing blocks (DSPs), processors, clock managers, delay lock loops (DLLs), and so forth.
Each programmable tile typically includes both programmable interconnect and programmable logic. The programmable interconnect typically includes a large number of interconnect lines of varying lengths interconnected by programmable interconnect points (PIPs). The programmable logic implements the logic of a user design using programmable elements that can include, for example, function generators, registers, arithmetic logic, and so forth.
The programmable interconnect and programmable logic are typically programmed by loading a stream of configuration data into internal configuration memory cells that define how the programmable elements are configured. The configuration data can be read from memory (e.g., from an external PROM) or written into the FPGA by an external device. The collective states of the individual memory cells then determine the function of the FPGA.
Other PLDs are programmed by applying a processing layer, such as a metal layer, that programmably interconnects the various elements on the device. These PLDs are known as mask programmable devices. PLDs can also be implemented in other ways, e.g., using fuse or antifuse technology. The terms “PLD” and “programmable logic device” include but are not limited to these exemplary devices, as well as encompassing devices that are only partially programmable.
FIG. 1 is a simplified illustration of an exemplary FPGA. The FPGA of FIG. 1 includes an array of configurable logic blocks (LBs 101a–101i) and programmable input/output blocks (I/Os 102a–102d). The LBs and I/O blocks are interconnected by a programmable interconnect structure that includes a large number of interconnect lines 103 interconnected by programmable interconnect points (PIPs 104, shown as small circles in FIG. 1). PIPs are often coupled into groups that implement multiplexer circuits selecting one of several interconnect lines to provide a signal to a destination interconnect line or logic block. For example, in FIG. 1 PIP group 105 forms an input multiplexer selecting one of several interconnect lines to provide an input signal to an input terminal of LB 101a. Some FPGAs also include additional logic blocks with special purposes (not shown), e.g., DLLs, RAM, and so forth.
FIG. 2 illustrates in simplified form a configurable logic element (CLE) for an FPGA. CLE 200 of FIG. 2 includes four similar slices SLICE_0–SLICE_3. Each slice includes two lookup tables (LUTs) 201 and 202, a write control circuit 205, three multiplexers MUX1, MUX2, and MF5—n, and two output memory elements 203 and 204. Each pair of slices also includes an additional multiplexer MF6—n, MF7—n, or MF8—n. Lookup tables 201 and 202, write control circuit 205, multiplexers MUX1 and MUX2, and output memory elements 203 and 204 are controlled by configuration memory cells M1–M7. Note that at least some of configuration memory cells M1–M7 represent more than one memory cell. Additional configuration memory cells and logic elements are omitted from FIG. 2, for clarity.
Each LUT 201, 202 can function in any of several modes. When in lookup table mode, each LUT has four data input signals IN1–IN4 that are supplied by the FPGA interconnect structure (see FIG. 1) via input multiplexers (e.g., see PIP group 105 in FIG. 1). When in RAM mode, input data is supplied by an input terminal RAM_DI_1, RAM_DI_2 to the DI terminal of the associated LUT. RAM write operations in both LUTs are controlled by write control circuit 205, which supplies one or more write control signals W to both LUTs based on RAM control signals provided by the interconnect structure. (In the present specification, the same reference characters are used to refer to terminals, signal lines, and their corresponding signals.)
Each LUT 201, 202 provides a LUT output signal to an associated multiplexer MUX1, MUX2, which selects between the LUT output signal and an associated register direct input signal Reg_DI_1, Reg_DI_2 from the interconnect structure. Thus, each LUT can be optionally bypassed. The output of each multiplexer MUX1, MUX2 is provided to the data input terminal D of an associated output memory element (203, 204 respectively). Memory elements 203 and 204 are clocked by a clock signal CK (e.g., provided by a global clock network) and controlled by various other register control signals (e.g., from the interconnect structure or provided by configuration memory cells of the FPGA). Each memory element 203, 204 provides a registered output signal Q1, Q2. The output of each LUT 201, 202 is also provided to an output terminal OUT1, OUT2 of the CLE. Thus, each output memory element can be optionally bypassed.
The LUT output signals can be multiplexed together to form some larger functions using the MF5–MF8 multiplexers. In each slice a corresponding multiplexer MF5—n is driven by the output signals from LUTs 201 and 202, and is controlled by an external input signal F5_Sel to provide output signal F5—n. Multiplexer MF6—n is provided only once for each pair of slices (e.g., in slices SLICE_0 and SLICE_2). Multiplexer MF6—n is driven by the F5—n output signals from the two associated slices, and is controlled by external input signal F6_Sel. Multiplexer MF7_1 is provided only in slice SLICE_1, and is driven by the F6_n output signals from slices SLICE_0 and SLICE_2. Therefore, multiplexer MF7_1 combines the output signals from all eight LUTs in CLE 200. Multiplexer MF7_1 is controlled by an external input signal F7_Sel (not shown). Multiplexer MF8_3 is provided only in slice SLICE_3, and is driven by the F7—1 output signal from the same CLE (CLE 200) and from an adjoining CLE (signal F7_1′). Therefore, multiplexer MF8_3 can be used to combine the output signals from all sixteen LUTs in two adjoining CLEs.
FIG. 3 illustrates in simplified form a well known 4-input lookup table (LUT) for a PLD. The lookup table is implemented as a four-stage 16-to-1 multiplexer. The four input signals A1–A4 together select one of 16 values stored in memory cells MC-0 through MC-15. Thus, the lookup table can implement any function of up to four input signals.
The four input signals A1–A4 are independent signals, each driving one stage of the multiplexer. Inverted versions A1B–A4B of signals A1–A4 are generated by inverters 301–304, respectively. Sixteen configuration memory cells MC-0 through MC-15 drive sixteen corresponding CMOS pass gates 330–345. In a first stage of the multiplexer, paired pass gates 330–331 form a 2-to-1 multiplexer controlled by signals A1 and A1B, which multiplexer drives a CMOS pass gate 346. Pass gates 332–345 are also paired in a similar fashion to form similar 2-to-1 multiplexers driving associated pass gates 347–353. In a second stage of the multiplexer, paired pass gates 346–347 form a 2-to-1 multiplexer controlled by signals A2 and A2B, which multiplexer drives an inverter 305. Similarly, pass gates 348–353 are paired to form similar 2-to-1 multiplexers driving associated inverters 306–308.
In a third stage of the multiplexer, driven by inverters 305–308, pass gates 354–355 are paired to form a 2-to-1 multiplexer controlled by signals A3 and A3B and driving a CMOS pass gate 358. Similarly, pass gates 356–357 are paired to form a similar 2-to-1 multiplexer driving a CMOS pass gate 359. In a fourth stage of the multiplexer, pass gates 358–359 are paired to form a 2-to-1 multiplexer controlled by signals A4 and A4B and driving an inverter 309. Inverter 309 provides the LUT output signal OUT.
FIG. 4 illustrates another known 4-input LUT. The LUT of FIG. 4 is similar to that of FIG. 3, except that N-channel transistors 430–459 are substituted for CMOS pass gates 330–359. Because an N-channel transistor imposes a voltage drop on power high signals traversing the transistor, the node driving each inverter 305–309 is also enhanced by the addition of a pullup (e.g., a P-channel transistor) 460–464 to power high VDD. Each pullup 460–464 is gated by the output of the corresponding inverter 305–309. The pullup ensures that a high value on the node driving the inverter is pulled all the way to the power high value once a low value appears on the inverter output node.
Wide multiplexers are frequently included in PLD designs, e.g., in digital signal processing (DSP) applications. There are various methods of implementing wide multiplexers, but one common method utilizes the existing LUTs, e.g., the LUTs shown in FIGS. 2–4. The implementation of an N-to-1 multiplexer (MUX) requires N input terminals for the N data inputs and log(2) N (the logarithm of N to the base 2) select input terminals. For example, a 2-to-1 multiplexer requires two data input terminals and 1 select terminal, a 4-to-1 multiplexer requires 4 data input terminals and 2 select terminals, and so forth. Table 1 shows the number of input terminals required to implement some variously-sized multiplexers.
TABLE 1Size of MUX#Data inputs#Select inputsTotal inputs2-to-12134-to-14268-to-1831116-to-1 1642032-to-1 3253764-to-1 64670
The largest multiplexer that can be implemented in one of the LUTs of FIGS. 3 and 4 is a 2-to-1 multiplexer, because each LUT can provide any function of up to four inputs. In the CLE of FIG. 2, each of LUTs 201, 202 can implement a 2-to-1 multiplexer; each MUXF5 multiplexer can be used with two LUTs to implement a 4-to-1 multiplexer; and each MUXF6 multiplexer can be used with two MUXF5 multiplexers and four LUTs to implement an 8-to-1 multiplexer. To implement a 16-to-1 multiplexer in the CLE of FIG. 2 requires one MUXF7 multiplexer, two MUXF6 multiplexers, four MUXF5 multiplexers, and eight LUTs (i.e., all of the LUTs in the CLE). Additionally, each path delay through the 16-to-1 multiplexer includes the delay of one LUT, three multiplexers, and the interconnect paths between these elements.
Therefore, it is desirable to provide LUTs that can more efficiently and/or rapidly perform wide multiplexing functions in PLDs.