The invention relates to Programmable Logic Devices (PLDs). More particularly, the invention relates to a configurable logic block (CLB) for a PLD that enables the rapid calculation of sum-of-products (SOP) functions.
Programmable logic devices (PLDs) are a well-known type of digital integrated circuit that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (FPGA), typically includes an array of configurable logic blocks (CLBs) surrounded by a ring of programmable input/output blocks (IOBs). The CLBs and IOBs are interconnected by a programmable interconnect structure. Some FPGAs also include additional logic blocks with special purposes (e.g., DLLs, RAM, and so forth).
The CLBS, IOBS, interconnect, and other logic blocks are typically programmed by loading a stream of configuration data (bitstream) into internal configuration memory cells that define how the CLBS, IOBS, and interconnect structure are configured. The configuration data can be read from memory (e.g., an external PROM) or written into the FPGA by an external device. The collective states of the individual memory cells then determine the function of the FPGA.
Other PLDs are programmed by applying a processing layer, such as a metal layer, that programmably interconnects the various elements on the device. These PLDs are known as ASIC devices (Application Specific Integrated Circuits). PLDS can also be implemented in other ways, e.g., using fuse or antifuse technology.
One type of PLD is the Virtex(trademark)-II family of FPGAs from Xilinx, Inc. (The Virtex-II FPGA is described in detail in pages 33-75 of the xe2x80x9cVirtex-II Platform FPGA Handbookxe2x80x9d, published December, 2000, available from Xilinx, Inc., 2100 Logic Drive, San Jose, Calif. 95124, which pages are incorporated herein by reference.) A Virtex-II FPGA includes an array of configurable logic blocks (CLBs) as described above. FIG. 1 is a simplified diagram of a Virtex-II CLB.
As shown in FIG. 1, a Virtex-II CLB includes four similar slices, SLICEs 0 through SLICE 3. Each slice includes two lookup tables (LUT 1, LUT 2). Each LUT has an associated carry multiplexer (CY1, CY2), two associated multiplexers (M1 and MX1, M2 and MX2), and an associated flip-flop (FF1, FF2). By programming the various multiplexers, each LUT output can be provided as a slice output signal and/or can be registered in the associated flip-flop. Each LUT output can also be placed on the carry chain or can alter a value already present on the carry chain. These aspects of CLBs are well known, and therefore are not described further herein.
However, each Virtex-II slice also includes an xe2x80x9cSOP chainxe2x80x9d, or sum-of-products chain. The SOP chain includes a multiplexer ORM that selects between an OR-input signal (e.g., OIN0) and a logic low level (xe2x80x9c0xe2x80x9d) under the control of a configuration memory cell (not shown). The output of multiplexer ORM is ORed together with the carry chain output COUT for the slice in OR gate xe2x80x9cORxe2x80x9d. The output of OR gate xe2x80x9cORxe2x80x9d is passed along the SOP chain to multiplexer ORM in the adjacent slice.
Note that in the present specification, the same reference characters are used to refer to terminals, signal lines, and their corresponding signals. Further, in CMOS logic an OR gate is typically implemented as a NAND-gate with inverted input signals. However, in the present specification the xe2x80x9cORxe2x80x9d symbol is used to simplify the drawings and to accurately represent the logical function. The term xe2x80x9cOR gatexe2x80x9d is also used herein to represent logic implementing the OR function, however implemented.
The SOP chain of FIG. 1 can be used to implement sum-of-products functions, as shown in FIG. 1A. For example, to implement a sum-of-products function each LUT (LUT 1, LUT 2 in each slice) is programmed to implement an AND function. The carry chain (CY1, CY2 in each slice) is then used to combine the AND functions into wider AND functions, as shown in FIG. 1A. The OR gates in the SOP chain (gate xe2x80x9cORxe2x80x9d in each slice) are then combined to provide the sum-of-products output signal, AND16OR2. Thus, as shown in FIG. 1A, a single Virtex-II CLB can be used to implement a 2-input OR function of two 16-input AND functions (i.e., the sum-of-products function of two product terms, each with 16 inputs). Additional inputs can be added to each AND function (i.e., to each product term) by extending the carry chains into vertically adjacent CLBs. Additional inputs can be added to each OR function by extending the SOP chain into horizontally adjacent CLBs.
While the CLB architecture of FIG. 1 enables the efficient implementation of sum-of-products functions, the speed of the computation is limited by the speed of the carry chain. In particular, getting xe2x80x9contoxe2x80x9d and xe2x80x9coff ofxe2x80x9d a carry chain typically carries a significant delay penalty. Alireza S. Kaviani proposes an alternative CLB architecture that bypasses the carry chain in commonly assigned, co-pending U.S. patent application Ser. No. 09/687,812, entitled xe2x80x9cConfigurable Logic Block for PLDxe2x80x9d and filed Oct. 13, 2000, which is hereby incorporated herein by reference. This alternative architecture is illustrated in FIG. 2.
The CLB of FIG. 2 includes four similar slices, each slice (SLICE 0 through SLICE 3) being similar to those shown in FIG. 1. However, in the architecture of FIG. 2, each slice includes an additional function generator FG. Function generator FG can be configured (e.g., by bits stored in configuration memory cells, not shown) to implement either a 2-input NOR function of the two LUT output signals, a 2-input NAND function of the two LUT output signals, a constant high value generator, or to pass another value supplied from elsewhere inside the CLB. To implement a sum-of-products function, function generator FG is configured to function as a 2-input NOR gate.
As shown in FIG. 2A, each LUT in slices SLICE 1 and SLICE 3 is configured as a NAND gate. The function generator FG of each slice is configured as a 2-input NOR gate. As is well known in the art of logic design, two NAND gates followed by a NOR gate are logically equivalent to a single wide AND gate. Therefore, the output of each function generator FG is the AND function of all eight LUT inputs for that slice, i.e., an 8-input product term. The product terms are then combined together in the SOP chain as in the CLB architecture of FIG. 1.
This architecture avoids the delay of the carry chain. Instead, the delay from a LUT output to the SOP chain (i.e., to an input terminal of one of the OR gates in the chain) is only the delay through function generator FG. However, the architecture of FIG. 2 also has its limitations. For example, the removal of the carry chain has resulted in a maximum of 8 AND inputs, rather than the virtually unlimited number of inputs supported in FIG. 1. Further, the delay from a LUT input terminal to the AND8OR2 output terminal includes not only the delay through the FG function generator, but an additional delay of up to two OR gate delays.
Therefore, Kaviani further proposes a second alternative CLB architecture, which is also disclosed in commonly assigned, co-pending U.S. patent application Ser. No. 09/687,812. In the architecture of FIG. 3, each slice includes the function generator FG as in FIG. 2. However, the SOP chain is omitted. Instead, the CLB includes a dedicated 4-input OR gate 301 that performs a logical OR function of the four output signals (OUT0-OUT3) from the four function generators FG of the four slices in the CLB. (The term xe2x80x9cdedicatedxe2x80x9d is used herein to describe a circuit designed to perform a single function, e.g., an OR function, as opposed to a circuit that can be programmed to implement any of a variety of functions. For example, OR gate 301 is a dedicated circuit, while function generator FG is not.)
Using the architecture of FIG. 3, a sum-of-products function can be implemented as shown in FIG. 3A. Each LUT is configured as a NAND gate, as in the architecture of FIG. 2. Each function generator FG is configured as a NOR gate. Thus, each function generator FG provides the AND function of all eight LUT inputs for that slice, e.g., an 8-input product term. The four product terms AND0-AND3 are then combined together in OR gate 301 as shown in FIG. 3A, to provide sum-of-products output signal AND8OR4.
The architecture of FIG. 3 includes neither the carry chain delay nor the SOP chain delay resulting from previous architectures (see FIGS. 1A and 2A). The delay between a LUT output signal and the output terminal AND8OR4 includes a single FG function generator delay and the delay through dedicated OR gate 301. However, it is desirable to further reduce the delay on the logic path, to enhance the performance of sum-of-products functions implemented in a programmable logic device.
The invention provides a variety of configurable logic block (CLB) architectures that enable the efficient implementation of sum-of-products functions in a programmable logic device (PLD). Output signals from each lookup table (LUT) in a CLB are routed directly to a dedicated OR structure, bypassing other logic (such as carry chains, other function generators, and so forth) typically included in a CLB. The dedicated OR structure logically xe2x80x9cORsxe2x80x9d together the signals from the LUTs. Thus, the LUTs can be programmed to implement AND functions, with the AND function results being ORed together in the dedicated OR structure, thereby providing a fast and efficient sum-of-products output signal.
In some embodiments, the dedicated OR structure includes programmable means for selectively combining the signals from the LUTs. The dedicated OR structure can include, for example, a 2-input multiplexer on each input terminal, each multiplexer selecting between an associated LUT output signal and a ground (logic low) signal. In one embodiment, the multiplexer is controlled by a configuration signal stored in a configuration memory cell coupled to a select terminal of the multiplexer. Clearly, the ground signal, when selected, is ignored by the dedicated OR structure. Thus, any LUT with a xe2x80x9cblockedxe2x80x9d output signal (i.e., any LUT having an associated multiplexer configured to select the ground input signal) can be used for other purposes, such as to implement unrelated logic.
In one embodiment, a CLB includes eight LUTS, and the output signals from all eight LUTs are ORed together in a single 8-input dedicated OR structure.
In other embodiments, four LUT outputs are combined in a first dedicated OR structure, with the other four LUT outputs being combined in a second dedicated OR structure. In one such embodiment, the first and second dedicated OR structures both drive a logical OR circuit that provides the combined sum-of products output signal. In another such embodiment, the first and second dedicated OR structures are cascaded such that the output of the second dedicated OR structure includes the OR function of all eight LUTs. In some embodiments, the first dedicated OR structure can also receive an external signal from an input terminal of the CLB, or from another node within the CLB. In some embodiments, the output of the first dedicated OR structure is selectively passed to the second dedicated OR structure, so the CLB can optionally be used to implement two independent sum-of-products functions.