Programmable logic devices (PLDs) are a well-known type of programmable integrated circuit that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (FPGA), typically includes an array of programmable tiles. These programmable tiles can include, for example, input/output blocks (IOBs), configurable logic blocks (CLBs), dedicated random access memory blocks (BRAM), multipliers, digital signal processing blocks (DSPs), processors, clock managers, delay lock loops (DLLs), and so forth.
Each programmable tile typically includes both programmable interconnect and programmable logic. The programmable interconnect typically includes a large number of interconnect lines of varying lengths interconnected by programmable interconnect points (PIPs). The programmable logic implements the logic of a user design using programmable elements that can include, for example, lookup tables, registers, arithmetic logic, and so forth.
The programmable interconnect and programmable logic are typically programmed by loading a stream of configuration data into internal configuration memory cells that define how the programmable elements are configured. The configuration data can be read from memory (e.g., from an external PROM) or written into the FPGA by an external device. The collective states of the individual memory cells then determine the function of the FPGA.
A lookup table (LUT) is a selection circuit that accepts any number of inputs up to a specified maximum number, and provides any function of the input values. A lookup table is typically implemented as a random access memory, with the inputs being used to address the memory. Thus, for an n-input lookup table, 2**n (two to the nth power) possible outputs are provided, providing one output value for each possible combination of n input values.
FIG. 1 illustrates a typical implementation for a 4-input LUT. The LUT of FIG. 1 is implemented as a 16-input multiplexer, with the 16 data inputs being 16 possible output values for the LUT, stored in memory cells MC<0:15>. For example, for a LUT in a programmable IC such as an FPGA, the memory cells can be configuration memory cells. The four control inputs for the multiplexer are the four input signals A1-A4 to the LUT. Thus, the LUT output signal OUT corresponds to one of the values stored in the 16 memory cells MC<0:15>, with the selection being controlled by the four LUT input signals A1-A4. Thus, the LUT can implement any function of up to four input signals. Note that while 4-input LUTs are common, LUTs having more or fewer input signals can also be implemented in a similar fashion that will accommodate larger or smaller logic functions. Note further that in the present specification, the same reference characters are used to refer to terminals, signal lines, and their corresponding signals.
As shown in FIG. 1, a typical 4-input LUT includes four stages, with the first stage being controlled by input signal A1, the second stage being controlled by input signal A2, and so forth. Inverted versions A1B-A4B of signals A1-A4 are generated by inverters 101-104, respectively. Each of sixteen configuration memory cells MC<0:15> drives a corresponding CMOS pass gate 130-145. In the first stage of the multiplexer, paired pass gates 130-131 form a 2-to-1 multiplexer controlled by signals A1 and A1B, which multiplexer drives a CMOS pass gate 146. Pass gates 132-145 are also paired in a similar fashion to form similar 2-to-1 multiplexers driving associated pass gates 147-153.
In the second stage of the multiplexer, paired pass gates 146-147 form a 2-to-1 multiplexer controlled by signals A2 and A2B, which multiplexer drives an inverter 105. Similarly, pass gates 148-153 are paired to form similar 2-to-1 multiplexers driving associated inverters 106-108. In the third stage of the multiplexer, driven by inverters 105-108, pass gates 154-155 are paired to form a 2-to-1 multiplexer controlled by signals A3 and A3B and driving a CMOS pass gate 158. Similarly, pass gates 156-157 are paired to form a similar 2-to-1 multiplexer driving a CMOS pass gate 159. In the fourth stage of the multiplexer, pass gates 158-159 are paired to form a 2-to-1 multiplexer controlled by signals A4 and A4B and driving an inverter 109. Inverter 109 provides the LUT output signal OUT.
FIG. 2 illustrates another known 4-input LUT. The LUT of FIG. 2 is similar to that of FIG. 1, except that N-channel transistors 230-259 are substituted for CMOS pass gates 130-159. Because an N-channel transistor imposes a voltage drop on power high signals traversing the transistor, the node driving each inverter 105-109 is also enhanced by the addition of a pullup (e.g., a P-channel transistor) 260-264 to power high VDD. Each pullup 260-264 is gated by the output of the corresponding inverter 105-109. The pullup ensures that a high value on the node driving the inverter is pulled all the way to the power high value once a low value appears on the inverter output node.
The known LUT designs of FIGS. 1 and 2 both function well for smaller LUTs such as 4-input LUTs. However, when the same techniques are applied to larger LUTs, some limitations become apparent. For example, the delays on the various input paths are not the same. Referring to FIGS. 1 and 2, for example, with stable values stored in memory cells MC<0:15>, a change to the A1 input signal will take longer to propagate to the output terminal OUT than a change to the A4 input signal. Thus, if all of the input signals change values simultaneously, for example, the delay from the A1 signal determines the overall through-delay for the LUT.
One solution to this difficulty is to write the design implementation software (e.g., the place-and-route software for a programmable IC) such that later-arriving input signals are placed on the faster inputs, e.g., A4 and A3 in the LUTs of FIGS. 1 and 2. However, such refinements increase the complexity and run-time of the implementation software, as well as hindering the flexibility of pin assignments for the LUTs. Therefore, it is desirable to provide other means by which the through-delays for LUT input pins may be more balanced.