The present invention relates to programmable logic devices, and more particularly to lookup tables utilized in programmable logic devices.
FIG. 1A shows a basic Field Programmable Gate Array (FPGA) 100, which is a type of Programmable Logic Device (PLD). FPGA 100 includes an array of configurable logic blocks (CLBs) CLB-1,1 through CLB-4,4 that are surrounded by input/output blocks (IOBs) IOB-1 through IOB-16, and programmable interconnect resources that include vertical interconnect segments 120 and horizontal interconnect segments 121 extending between the rows and columns of CLBs and IOBs. Each CLB includes configurable combinational circuitry and optional output registers that are programmed to implement a portion of a user""s logic function. The interconnect segments of the programmable interconnect resources are configured using various switches to generate signal paths between the CLBs that link the logic function portions. Each IOB is configured to selectively utilize an associated pin (not shown) of FPGA 100 either as a device input pin, a device output pin, or a bi-directional pin. Although greatly simplified, FPGA 100 is generally consistent with FPGAs that are produced, for example, by Xilinx, Inc. of San Jose, Calif.
FIGS. 1B through 1D show examples of the various switches associated with the programmable interconnect resources of FPGA 100. FIG. 1B shows an example of a six-way segment-to-segment switch 122 that selectively connects vertical wiring segments 120(1) and 120(2) and horizontal wiring segments 121(1) and 121(2) in accordance with configuration data stored in memory cells M1 through M6. Alternatively, if horizontal and vertical wiring segments 120 and 121 do not break at an intersection, a single transistor makes the connection. FIG. 1C shows an example of a segment-to-CLB/IOB input switch 123 that selectively connects an input wire 110(1) of a CLB or IOB to one or more interconnect wiring segments in accordance with configuration data stored in memory cells M7 and M8. FIG. 1D shows an example of a CLB/IOB-to-segment output switch 124 that selectively connects an output wire 115(1) of a CLB or IOB to one or more interconnect wiring segments in accordance with configuration data stored in memory cells M9 through M11.
Since the first FPGA was invented in 1984, variations on the basic FPGA circuitry have been devised that allow FPGAs to implement specialized functions more efficiently. For example, special interconnection lines have been added to allow adjacent CLBs to be connected at high speed and without taking up general interconnection lines. In addition, hardware has been placed between adjacent CLBs that allows fast carry signal transmissions when an FPGA is configured to implement an arithmetic function or certain wide logic functions. Finally, the circuitry associated with the CLBs has undergone several changes that allow each CLB to implement specialized functions more efficiently. Such CLB modifications are particularly relevant to the present invention.
FIG. 2 shows a CLB used in the Virtex(trademark) series of FPGAs produced by Xilinx, Inc. (Virtex is a trademark of Xilinx, Inc., assignee of the present patent application.) The CLB includes two slices SLICE-0 and SLICE-1. Each slice includes a pair of four-input lookup tables (LUTs) LUT F and LUT G, a pair of registers FF-X and FF-Y, and additional arithmetic carry and control (CARRY and CNTRL) logic. The output signal from each LUT is programmably controlled such that it is either transmitted directly to the surrounding interconnect resources (not shown), or applied to the D input of an associated register. Additional information regarding registers FF-X and FF-Y and the carry and control circuitry of the CLB can be found in the xe2x80x9cVirtex(trademark) 2.5 V Field Programmable Gate Arrays Advance Product Specificationxe2x80x9d, which was made available Mar. 13, 1999 on the World Wide Web at http://www.Xilinx.com/partinfo/virtex.pdf, and is incorporated herein by reference. A paper copy of this Mar. 13, 1999 document can be obtained from Xilinx, Inc., 2100 Logic Drive, San Jose, Calif. 95124.
FIG. 3A shows a lookup table (LUT) 300 that is used to implement LUT-G in the Virtex CLB shown in FIG. 2. LUT 300 includes a predecoder 310, a latch circuit 320, a write decoder 330, a memory block 340 and a read decoder 350. Input terminals IN1 through IN4 receive input signals from interconnect lines (see FIGS. 1A, 1C) of a host FPGA. These input signals are transmitted to predecoder 310, which generates an eight-bit address signal on read address lines R1 trough R8 in response to the input signals. Read address lines R1 through R8 transmit the address signal to read decoder 350. In addition, the read address lines R1 through R8 of one LUT (LUT-G in FIG. 2) are connected to latch circuit 320. Latch circuit 320 temporarily stores the eight bits of the address signal transmitted on read address lines R1 through R8, and applies the eight bits as a write address signal to write address lines W1 through W8. This write address signal is applied to write decoder 330, and is also transmitted to the write decoder of the second LUT of the Virtex(trademark) CLB (i.e., LUT-F; see FIG. 2). In other devices, such as those in Xilinx""s XC4000(trademark) series of FPGAs, each LUT of a CLB has independent write address lines.
Memory block 340 includes sixteen memory circuits PMC-1 through PMC-16. As discussed below, each memory circuit PMC-1 through PMC-16 is capable of storing one data bit. Data bits are stored during configuration, and read during a read-back operation. During the configuration mode, data bits are transmitted to memory circuits PMC-1 through PMC-16 using address and data signals transmitted from a configuration bus (CONFIG BUS). During a memory write operation, data bits transmitted through a data-in DIN terminal are passed to memory cell input terminals QIN of selected memory circuits PMC-1 through PMC-16 by write decoder 330. Each data bit is passed to a selected QIN terminal based on the write address signal transmitted to write decoder 330 on write address lines W1 through W8. During subsequent memory read operations, data bits are transmitted from memory circuit output terminals QO of selected memory circuits to a LUT output terminal OUT by read decoder 350 in response to the read address signals transmitted on read address lines R1 through R8.
In addition to the configuration mode and memory read/write operations, LUT 300 can implement a shift register. During shift register operations, data bits are transmitted directly from the DIN terminal to the shift-in terminal SIN of memory circuit PMC-1, and then transmitted sequentially from the QO terminals of each memory circuit to the SIN terminals of a subsequent memory circuit. This shift register structure is further described by Bauer in U.S. Pat. No. 5,889,413, which is incorporated herein by reference.
FIGS. 3B through 3F show additional details of LUT 300. FIG. 3B shows relevant portions of predecoder 310. Predecoder 310 receives input signals on LUT input terminals IN1 through IN4. These input signals are inverted by first inverters 313, and are transmitted in non-inverted and inverted forms to NAND gates 315. NAND gates 315 generate output signals based on the logical NAND of selected pairs of the non-inverted or inverted input signals. These output signals are transmitted from NAND gates 315 to second inverters 317 which generate the eight read address signals R1 through R8.
FIG. 3C shows write decoder 330 of CLB 300. A data input signal DIN is passed to the QIN terminal of a memory cell PMC-1 through PMC-16 as selected by write address signals W1 through W8.
FIG. 3D shows read decoder 350 of LUT 300. Decoder 350 selects an output signal QO from one of memory cells PMC-1 through PMC-16 as selected by address signals R1 through R8 and places the selected signal on terminal OUT if configured to do so by multiplexer 358.
FIG. 3E shows memory circuits PMC-15 and PMC-16 of memory block 340 (see FIG. 3A). Memory circuits PMC-15 and PMC-16 are accessed by shift-in terminal SIN through a control transistor 342. CMOS transmission gate 343 is used to latch the memory cell output value while the memory cell itself is changing. This latching feature is particularly important during shift operations. (Signal PHI-1 goes low only briefly, as described by Johnson et al. in U.S. Pat. No. 5,933,369 entitled xe2x80x9cRAM With Synchronous Write Port Using Dynamic Latchesxe2x80x9d.
Cell 341 is a seven transistor (xe2x80x9c7Txe2x80x9d) memory cell that includes a latch circuit 345, configuration transistors 346 and 347, and a feedback NMOS transistor 348. The advantage of using transistor 348 is described by Frake et al. in U.S. Pat. No. 5,764,564, incorporated herein by reference. Configuration transistors 346 and 347 are controlled by an address signal transmitted on address lines Al through A16 from the configuration bus (lines A15 and A16 shown).
Operation of LUT 300 will now be described with reference to FIGS. 3A through 3E. A high address signal turns on configuration transistors 346 and 347 to pass a data bit from configuration data terminals D and Db to latch circuit 345. After configuration, if the LUT has been configured as a RAM, data is written to memory circuits PMC-1 through PMC-16 from the DIN terminal (FIGS. 3C and 3E) using write decoder 330. To transmit a data bit to a selected memory circuit PMC-1 through PMC-16, the address of the selected memory circuit is transmitted via the PLD interconnect resources to LUT input terminals IN1 through IN4 (FIG. 3A). These input signals are transmitted to predecoder 310, which generates corresponding address signals that are stored in latch circuit 320. These latched address signals are transmitted via write address lines W1 through W8 to write decoder 330. Write-strobe control signal WS is subsequently pulsed high to pass the data bit through a selected write-strobe transistor to the QIN terminal of the selected memory circuit. Note that the data is passed through only one write-strobe transistor of write control circuit 339 because only one of the sixteen output terminals of secondary switch groups 335 through 338 is actively driven. Referring to FIG. 3E, the data bit passes from the DIN terminal to the Q terminal of the selected memory cell 341. The data bit is applied from the Q terminal to the gates of transistors P1 and N1 while ground disconnect transistor 348 is turned off, thereby causing the selected latch circuit 345 to store the data bit.
In contrast to memory write operations, shift register operations transmit data bits to memory circuits PMC-1 through PMC-16 without passing through write decoder 330. As shown in FIG. 3A, the DIN terminal of LUT 300 is connected to the shift-in (SIN) terminal of memory circuit PMC-1. Each memory circuit output terminal QO is connected to the shift-in terminal SIN of the next memory circuit. Referring to FIG. 3E, the data bit on the Q-15 output terminal of memory circuit PMC-15 is transmitted to the Q terminal of PMC-16 in response to the PHI-2 control signal, turning on transistor 342. During a subsequent cycle of the shift register operation, the stored data bit is then applied in an inverted form on the Qb terminal of cell 341 for transmission to a subsequent memory circuit. In all other operational modes, control transistor 342 is maintained in an off state by a low PHI-2 control signal to isolate the SIN terminal from the Q terminal of memory cell 341.
Referring again to FIG. 3A, memory read (and LUT) operations are performed by passing the data bit stored in a selected memory circuit PMC-1 through PMC-16 to LUT terminal OUT using read decoder 350. For example, to read a data bit from a selected memory circuit, an appropriate address is transmitted as a corresponding set of input signals via the PLD interconnect resources (not shown) to LUT input terminals IN1 through IN4. These input signals are transmitted to predecoder 310, which generates corresponding read address signals on read address lines R1 through R8 that are transmitted to read decoder 350. Referring to FIG. 3E, an inverted data bit stored in latch 345 is applied to the Qb terminal. During a memory read operation, this inverted data bit is transmitted from memory cell 341 through CMOS transmission gate 343, which is enabled (turned on) by the PHI-1 and PHI-1b control signals, and inverter 344 to the output terminal Q-1 through Q-16 of the selected memory circuit. Turning now to FIG. 3D, the read address signals are applied from read address lines R1 through R8 to switch groups 351 through 354 and switches 355-1 through 355-4 to inverter 356, multiplexer 358 and inverter 359 to the LUT terminal OUT.
LUT 300 has proven extremely useful for implementing many logic functions. However, several features of LUT 300 produce operation and layout issues that are less than optimal.
First, as discussed above with reference to FIG. 3C, during a memory write operation, the incoming data signal transmitted from inverter 304 must pass through three pass transistors (gates) before it reaches the QIN terminal of a selected memory circuit PMC-1 through PMC-16. Two of these three gates are associated with the write decode process, and one gate is associated with write-strobe operations. This creates a problem in that the data signal that reaches the selected memory cell is relatively weak. To account for this weak signal, the clock driving memory circuits PMC-1 through PMC-16 must be relatively slow.
Another problem presented by write decoder 330 is that data bits are transmitted via a single terminal of memory circuits PMC-1 through PMC-16 during memory write operations (i.e., via the Q terminal). A second write decoder would be required to direct complementary data bits to a second terminal of the selected memory circuit (i.e., the Qb terminal). Likewise, data bits are transmitted only to the Q terminal of each memory cell during shift register operations, as indicated in FIG. 3E. The problem presented by writing to a memory cell using a single terminal is that if the data bit signal is not strong enough, then it may not be possible to reliably overpower the feedback inverter (i.e., P2 and N2) of the memory cell.
Turning off transistor 348 during writing assists memory cell 341 to flip properly, as discussed by Frake in U.S. Pat. No. 5,764,564, but it doesn""t guarantee proper operation since the voltage applied to the Q terminal may not cause the inverter comprising transistors P1 and N1 to flip the inverter comprising transistors P2 and N2, especially as technology moves to lower supply voltages. Moreover, several problems are created by adding feedback NMOS transistor 348 to memory cells 341 of memory circuits PMC-1 through PMC-16. First, the addition of feedback NMOS transistor 348 turns each memory cell 341 into a 7T memory cell, which is not required in any other part of the host PLD. As such, memory cells 341 are laid out and built differently from all other memory cells of the host PLD, so design changes are more complex than if a single memory cell were used throughout the host PLD. Second, because the physical layout of memory cells 341 differs from that of all other memory cells, gaps are required between memory cells 341 and the 6T memory cells of the host PLD. As a result, the layout of LUT 300 is inefficient in that it requires spaces separating memory cells 341 from other configuration memory cells.
What is needed is a LUT implementation in an FPGA PLD that overcomes the deficiencies of LUT 300, and does so in an area efficient manner.
The present invention is directed to a fast, area efficient lookup table (LUT) that is used as a function generator, a shift register, or a RAM in a programmable logic device (PLD). In accordance with the various aspects of the invention, the write decoder, read decoder and memory circuits of the LUT are modified to improve performance during memory read operations, while providing a highly area efficient layout arrangement that minimizes the overall layout area of the LUT.
Fast Write Data Path
In accordance with a first aspect of the present invention, a write decoder includes a plurality of logic gates that generate appropriate select signals during memory write operations in response to the input signals. For example, the logic gates generate sixteen select signals in response to four input signals, one of the sixteen select signals being asserted (e.g., high) in response to a corresponding sequence of input signals. These sixteen select signals allow data input signals to pass to the memory circuits through a minimum number of pass transistors.
Decoder Shared by LUT Pairs
In accordance with a second aspect of the present invention, associated LUT pairs share a common write decoder. Data signals are transmitted to each LUT of a pair from a separate source, and are directed to selected memory cells in response to the select signals. Because the common write decoder does not act as a decode tree to route data signals, the select signals can be shared by the two associated LUTs. Because two LUTs share a single write decoder, the overall layout size is reduced.
True and Complement Data Signals
In the prior art circuit of FIG. 3A using the write decode tree shown in FIG. 3C, it is not practical to drive the memory cells with both true and complement data input signals because separate write decode trees would be required for the true and complement data input signals. However, in accordance with a third aspect of the present invention, because the data input signal is not passed through a write decode tree before reaching a memory cell, both true and complement data signals can be transmitted to the memory cell circuit without requiring two separate write decode trees. Using both true and complement data input signals makes memory write operations more reliable and faster, and eliminates the need for a ground disconnect transistor. Moreover, the problem associated with the prior art write decoder of charge sharing between a memory cell node and a write decode node (could cause the memory cell to inadvertently flip) is eliminated because individual select signals are utilized to access each memory cell circuit.
Regular Write Decoder Layout
In accordance with a fourth aspect of the present invention, the logic gates utilized in the write decoder are fabricated using two rows of transistors (one row of P-channel transistors and one row of N-channel transistors). The logic gates are arranged to match the pitch of the memory cells in the memory block. For example, if four-input NOR gates are used to access memory cells formed by four transistors arranged in parallel, then the width of the four-input NOR gates matches the width of the memory cells. This arrangement minimizes the combined layout area of the write decoder and memory block because diffusions can be shared between the NOR gate transistors and the memory cell transistors. In addition, the regularity and compactness of the arrangement allows for a circuit having a given number of transistors to be formed in less layout area.
Further, because both true and complement data signals are applied to the memory cell, the ground disconnect transistor utilized in the prior art memory cell can be eliminated, thereby allowing the same six-transistor (6T) memory cell utilized in other portions of the PLD to be utilized by the memory circuits of a LUT. The use of 6T memory cells reduces the layout area of the LUT, and simplifies the fabrication process because the need to design and lay out seven-transistor (7T) memory cells is eliminated.
Feedback Pulls Up Output Signal
In accordance with a fifth aspect of the present invention, each memory circuit includes an output latching transistor and a feedback inverter circuit connected between the output latching transistor and a memory cell output terminal. The feedback inverter circuit enables the input terminal of the inverter to reach a fully high value in spite of being driven by an NMOS gate from the memory cell circuit. The feedback inverter circuit lays out in a more efficient manner because it shares diffusion with a PMOS transistor in the inverter. Further, replacing the CMOS switch with a relatively resistive single NMOS pass transistor reduces charge-sharing problems. In particular, this NMOS pass transistor is more resistive than a full CMOS transmission gate, reducing the effect on the related memory cell of charge potentially stored by the capacitance of the feedback inverter circuit. The charge sharing problem is further minimized by the presence of additional control transistors (used to pass true and complement data signals during memory write operations) to the memory cell output terminal, thereby increasing the capacitance on the memory circuit side of the NMOS pass gate relative to the input capacitance of the feedback inverter.
Multiplexer Buffered Mid-way
In accordance with a sixth aspect of the present invention, the read decoder is formed as a multi-stage multiplexer tree with inverters located between two stages of the tree, which buffer the signal and reduce signal delays during memory read operations. For example, in a three-stage multiplexer tree, the outputs of the second stage drive the inputs of the third stage through inverters. By placing the inverters between the second and third stages of multiplexers, these inverters allow the gates in the upstream 2-to-1 multiplexers to be much smaller and faster, and minimize the RC delay of the read decoder.
High Speed or High Function Output Option
In addition, an output control circuit is provided at the outputs of the last stage of multiplexers in the output multiplexer tree that includes programmable circuitry for selectively routing data either on a high speed output path or on a relatively slower high function path. Data transmitted on the high function output path passes through logic gates that receive signals from an adjacent circuit associated with, for example, arithmetic summing operations or wide function multiplexers.