Programmable logic devices (PLDs) are a well-known type of integrated circuit that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (FPGA), typically includes an array of programmable tiles. These programmable tiles can include, for example, input/output blocks (IOBs), configurable logic blocks (CLBs), dedicated random access memory blocks (BRAM), multipliers, digital signal processing blocks (DSPs), processors, clock managers, delay lock loops (DLLs), and so forth.
Each programmable tile typically includes both programmable interconnect and programmable logic. The programmable interconnect typically includes a large number of interconnect lines of varying lengths interconnected by programmable interconnect points (PIPs). The programmable logic implements the logic of a user design using programmable elements that can include, for example, function generators, registers, arithmetic logic, and so forth.
The programmable interconnect and programmable logic are typically programmed by loading a stream of configuration data into internal configuration memory cells that define how the programmable elements are configured. The configuration data can be read from memory (e.g., from an external PROM) or written into the FPGA by an external device. The collective states of the individual memory cells then determine the function of the FPGA.
Another type of PLD is the Complex Programmable Logic Device, or CPLD. A CPLD includes two or more “function blocks” connected together and to input/output (I/O) resources by an interconnect switch matrix. Each function block of the CPLD includes a two-level AND/OR structure similar to those used in Programmable Logic Arrays (PLAs) and Programmable Array Logic (PAL) devices. In CPLDs, configuration data is typically stored on-chip in non-volatile memory. In some CPLDs, configuration data is stored on-chip in non-volatile memory, then downloaded to volatile memory as part of an initial configuration sequence.
For all of these programmable logic devices (PLDs), the functionality of the device is controlled by data bits provided to the device for that purpose. The data bits can be stored in volatile memory (e.g., static memory cells, as in FPGAs and some CPLDs), in non-volatile memory (e.g., FLASH memory, as in some CPLDs), or in any other type of memory cell.
Other PLDs are programmed by applying a processing layer, such as a metal layer, that programmably interconnects the various elements on the device. These PLDs are known as mask programmable devices. PLDs can also be implemented in other ways, e.g., using fuse or antifuse technology. The terms “PLD” and “programmable logic device” include but are not limited to these exemplary devices, as well as encompassing devices that are only partially programmable.
FIG. 1 is a simplified illustration of an exemplary FPGA. The FPGA of FIG. 1 includes an array of configurable logic blocks (LBs 101a–101i) and programmable input/output blocks (I/Os 102a–102d). The LBs and I/O blocks are interconnected by a programmable interconnect structure that includes a large number of interconnect lines 103 interconnected by programmable interconnect points (PIPs 104, shown as small circles in FIG. 1). PIPs are often coupled into groups (e.g., group 105) that implement multiplexer circuits selecting one of several interconnect lines to provide a signal to a destination interconnect line or logic block. Some FPGAs also include additional logic blocks with special purposes (not shown), e.g., DLLs, RAM, and so forth.
FIG. 2 illustrates in simplified form a configurable logic element (CLE) for an FPGA. CLE 200 of FIG. 2 includes four similar slices SLICE_0–SLICE_3. Each slice includes two lookup tables (LUTs) 201 and 202, a write control circuit 205, two multiplexers MUX1 and MUX2, and two output memory elements 203 and 204. The lookup tables, write control circuit, multiplexers, and output memory elements are all controlled by configuration memory cells M1–M7. Note that at least some of configuration memory cells M1–M7 represent more than one memory cell. Additional configuration memory cells and logic elements are omitted from FIG. 2, for clarity.
Each LUT 201, 202 can function in any of several modes. When in lookup table mode, each LUT has four data input signals IN1–IN4 that are supplied by the FPGA interconnect structure (not shown) via input multiplexers (not shown). (In the present specification, the same reference characters are used to refer to terminals, signal lines, and their corresponding signals.) When in RAM mode, input data is supplied by an input terminal RAM_DI_1, RAM_DI_2 to the DI terminal of the associated LUT. RAM write operations in both LUTs are controlled by write control circuit 205, which supplies one or more write control signals W to both LUTs based on RAM control signals provided by the interconnect structure.
Each LUT 201, 202 provides a LUT output signal to an associated multiplexer MUX1, MUX2, which selects between the LUT output signal and an associated register direct input signal Reg_DI_1, Reg_DI_2 from the interconnect structure. Thus, each LUT can be optionally bypassed. The output of each multiplexer MUX1, MUX2 is provided to the data input terminal D of an associated output memory element (203, 204 respectively). Memory elements 203 and 204 are clocked by a clock signal CK (e.g., provided by a global clock network) and controlled by various other register control signals (e.g., from the interconnect structure or provided by configuration memory cells of the FPGA). Each memory element 203, 204 provides a registered output signal Q1, Q2. The output of each LUT 201, 202 is also provided to an output terminal OUT1, OUT2 of the CLE. Thus, each output memory element can be optionally bypassed. The slice also includes output multiplexers (not shown) that select from among the various output signals of the slice and provide the selected signals to the FPGA interconnect structure. These output multiplexers are also controlled by configuration memory cells (not shown).
One programmable element commonly found in FPGA logic blocks is the lookup table, or LUT. A LUT is a memory array (e.g., a 16×1 array) addressable by a number of input signals (e.g., four input signals). By programming predetermined values into the memory array, the LUT can implement any function of the input variables. While 4-input LUTs are common, LUTs having more or fewer input signals can also be implemented that will accommodate larger or smaller logic functions.
FIG. 3 illustrates in simplified form a well known 4-input lookup table (LUT) for a PLD. The lookup table is implemented as a four-stage 16-to-1 multiplexer. The four input signals A1–A4 together select one of 16 values stored in memory cells MC-0 through MC-15. Thus, the lookup table can implement any function of up to four input signals.
The four input signals A1–A4 are independent signals, each driving one stage of the multiplexer. Inverted versions A1B–A4B of signals A1–A4 are generated by inverters 401–404, respectively. Sixteen configuration memory cells MC-0 through MC-15 drive sixteen corresponding inverters 310–325, each of which drives a corresponding CMOS pass gate 330–345. In a first stage of the multiplexer, paired pass gates 330–331 form a 2-to-1 multiplexer controlled by signals A1 and A1B, which multiplexer drives a CMOS pass gate 346. Pass gates 332–345 are also paired in a similar fashion to form similar 2-to-1 multiplexers driving associated pass gates 347–353. In a second stage of the multiplexer, paired pass gates 346–347 form a 2-to-1 multiplexer controlled by signals A2 and A2B, which multiplexer drives an inverter 305. Similarly, pass gates 348–353 are paired to form similar 2-to-1 multiplexers driving associated inverters 306–308.
In a third stage of the multiplexer, driven by inverters 305–308, pass gates 354–355 are paired to form a 2-to-1 multiplexer controlled by signals A3 and A3B and driving a CMOS pass gate 358. Similarly, pass gates 356–357 are paired to form a similar 2-to-1 multiplexer driving a CMOS pass gate 359. In a fourth stage of the multiplexer, pass gates 358–359 are paired to form a 2-to-1 multiplexer controlled by signals A4 and A4B and driving an inverter 309. Inverter 309 provides the LUT output signal OUT.
FIG. 4 illustrates a known configuration memory cell and pass gates controlled by the configuration memory cell. A configuration memory cell typically includes two cross-coupled logic gates, such as the two inverters formed by P-channel transistor 401 and N-channel transistor 403, and by P-channel transistor 402 and N-channel transistor 404. The output nodes of the two cross-coupled logic gates are referred to herein as “storage nodes”. The storage node of the first inverter is node Q. In FIG. 4, node Q drives pass gates 407. Pass gates 407 can be, for example, part of a routing multiplexer, lookup table, user storage element (e.g., block RAM or any other type of memory available for the storage of user data), or other configurable logic element. In some configuration memory cells, node QB, the storage node of the second inverter, drives the pass gates. In some configuration memory cells, both storage nodes Q and QB are used to drive logic external to the cell.
An N-channel transistor 405 is coupled between node Q and a first bit line BIT, and gated by a word line WORD. Another N-channel transistor 406 is coupled between node QB and a second bit line BITB, which carries an inverse value from the first bit line. Transistor 406 is also gated by word line WORD. Bit lines BIT and BITB are used to carry values written to the configuration memory cell, and also to read values from the configuration memory cell, e.g., during a configuration readback procedure. Variations on the circuit of FIG. 4 are also well known, e.g., two word lines can be provided, or only one bit line can be included. The circuit of FIG. 4 is a representative example of the various well-known memory cell configurations.
A PLD interconnect structure can be complex and highly flexible. For example, Young et al. describe the interconnect structure of an exemplary FPGA in U.S. Pat. No. 5,914,616, issued Jun. 22, 1999 and entitled “FPGA Repeatable Interconnect Structure with Hierarchical Interconnect Lines”, which is incorporated herein by reference in its entirety.
As described above, programmable interconnect points (PIPs) are often coupled into groups (e.g., group 105 of FIG. 1) that implement multiplexer circuits selecting one of several interconnect lines to provide a signal to a destination interconnect line or logic block. A routing multiplexer can be implemented, for example, as shown in FIG. 5. The illustrated circuit selects one of several different input signals and passes the selected signal to an output terminal. Note that FIG. 5 illustrates a routing multiplexer with eight inputs, but PLD routing multiplexers typically have many more inputs, e.g., 28, 30, or 32. However, FIG. 5 illustrates a smaller circuit, for clarity.
The circuit of FIG. 5 includes eight input terminals IN0—IN7 and ten pass gates 500–509. Pass gates 500–503 selectively pass input signals IN0—IN3, respectively, to a first internal node INT1. Each pass gate 500–503 has a gate terminal driven by a configuration memory cell M12–M15, respectively. Similarly, pass gates 504–507 selectively pass input signals IN4—IN7, respectively, to a second internal node INT2. Each pass gate 504–507 has a gate terminal driven by one of the same configuration memory cells M12–M15, respectively. From internal nodes INT1, INT2, pass gates 508, 509 are controlled by configuration memory cells M10, M11, respectively, to selectively pass at most one signal to a third internal node INT3.
The signal on internal node INT3 is buffered by buffer BUF to provide output signal ROUT. Buffer BUF includes two inverters 511, 512 coupled in series, and a pullup (e.g., a P-channel transistor 513 to power high VDD) on internal node INT3 and driven by the node between the two inverters.
Values stored in configuration memory cells M10–M15 select at most one of the input signals IN0–IN7 to be passed to internal node INT3, and hence to output node ROUT. If none of the input signals is selected, output signal ROUT is held at its initial high value by pullup 513.
Clearly, a circuit implemented in flexible programmable logic such as that shown in FIGS. 1–5 can potentially be slower than circuitry implemented using dedicated logic (i.e., logic designed for a specific purpose). For example, a circuit implemented using LUTs and flip-flops might need to traverse a succession of LUTs and interconnections between each pair of successive flip-flops, as shown in FIG. 6. The exemplary signal path illustrated in FIG. 6 connects an output terminal of flip-flop 601 with an input terminal of flip-flop 609, and sequentially traverses interconnect 602, LUT 603, interconnect 604, LUT 605, interconnect 606, interconnect 607, and LUT 608. The path delay includes one clock-to-out delay for flip-flop 601, four interconnect delays, three LUT delays, and one setup time for flip-flop 609. The total of these delays determines the minimum clock period for the illustrated signal path.
In non-programmable circuits, one known method of increasing circuit performance is the use of dynamic logic. In dynamic circuitry, many or all nodes (e.g., all output nodes) are pre-charged to a first known value. This state is referred to herein as the “pre-charge state”. At a later time the circuit enters the “evaluation state”, in which the pre-charge is released and some of the pre-charged nodes change to a second known value, as determined by the logic. In clocked dynamic logic, for example, all nodes can be pulled high at a falling edge of a clock, and then some of the nodes are selectively pulled low at the rising edge of the clock. Therefore, whenever the clock is low the circuit is in the pre-charge state, and whenever the clock is high the circuit is in the evaluation state. (Clearly, dynamic circuits also can be designed to operate in the opposite fashion, i.e., to be in the pre-charge state whenever the clock is high, and in the evaluation state whenever the clock is low.) Thus, only the falling edge on the pre-charged nodes is speed-critical, and circuitry can be skewed for a fast falling edge and a slow rising edge on these nodes. Another type of known dynamic logic uses a self-resetting technique, in which the output node is pre-charged during the pre-charge state, then is conditionally discharged (evaluated) whenever an input node of the circuit changes state. Thus, a low pulse might or might not appear at the output node, based on the values of the various input signals.
The application of dynamic logic principles to PLDs is not straightforward. For example, if dynamic logic is applied to the LUT of FIG. 3, the circuit will not work, because the LUT output signals are non-monotonic. In other words, a LUT output signal can go either high or low (i.e., change state in either direction) depending on the contents of the memory cells, the values of the various input signals, and the relative timing of the input signals. However, LUTs and interconnect are widely used in FPGAs and can consume the largest percentage of the available cycle time in critical timing paths. Therefore, it is desirable to provide LUTs and interconnect circuits that enable the use of dynamic circuitry in PLDs.