1. Technical Field
This invention relates to Programmable Logic Devices providing reduced delays in cascade chain circuits.
2. Description of the Prior Art
Programmable Logic Devices (PLD) are general purpose logic devices that are configured to provide the functionality required for a particular application. The PLD is internally structured as multiple Configurable Logic Blocks (CLBs), each block containing a Look-Up Table (LUT) that provides a configurable logic function. The CLBs may be interconnected through programmable connection matrices that are provided between them. Multiple CLBs are interconnected to implement the desired logic functions. PLDs are often used in applications that require evaluation of functions involving a large number of inputs being processed in parallel. This requirement leads to the need for cascading of several CLBs to provide the desired number of inputs and/or outputs. The intermediate outputs from each CLB are connected serially using gates to get the desired output. Use of the LUT to perform the desired logic function generally involves considerable propagation delay. Such delay is generally unacceptable for simple logic functions. To overcome the problem of delay most PLDs incorporate special “Cascade Logic” that facilitates the formation of a chain of logic providing minimal delays using special circuitry generally in the form of multiplexers or particular logic gates.
FIG. 1 shows a generic Cascade Logic architecture. A two-input AND gate is included at the output of the LUT in each CLB. One input of each two-input AND gate receives the output from the corresponding LUT while the second input forms the Cascade Logic input that is connected to the Cascade Logic Output from the previous CLB stage. This implementation is very inefficient for implementing Cascade Logic functions that are not simple AND gates. The delay in generating the final output is equal to the sum of the propagation delays of all the AND gates in the Cascade chain + the delay of the first LUT in the chain.
FIG. 2 shows the Cascade Chain implementation in Virtex II (reference: Data Sheet DS031-2(v1.5) Apr. 2, 2001) by Xilinx. MUXCY 201 is used to implement AND, NAND, NOR, OR gate cascade chains. MUX 304 is used to select between signal BX and CASCADE_IN (which is also carry_in in Virtex II) for providing the input to the chain from the outside. Mux 303 is used to initialize the other input to MUXCY as either “0” or “1”. LUTOUT is selected using Mux 302. MUXCY works as an AND, OR, NAND, NOR gate depending on the polarity of the inputs. This implementation does not have delay on the LUT output but the additional outputs for the LUT and for CASCADE_OUT increase the number of routing resources needed. This cascade logic is capable of implementing several logic functions including AND, OR, NOR, and NAND.
FIG. 3 shows another example of cascade chain implementation in field programming gate arrays (FPGAs) used by Altera Inc., Ref APEX 20K Programmable Logic Device Family Data Sheet Ver. 3.7 May 2001. The output of LUT 401 and cascade input signal are logically ANDed by AND gate 402. The output of gate 402 is the LUT output. The Cascade chain output (if chain is terminated) as well as the cascade input for the next stage depend on the value of configuration bit P1. When the cascade chain is not used, configuration bit P1 is set to “0” and the output of multiplexer 403 is “1” enabling AND gate 402 which passes the output of LUT 401 to LUT_OUT. When P1 is set to “1” the output of LUT 401 and cascade input CASCADE_IN are logically ANDed. The output of gate 402 is the final cascade chain output for the last stage of the chain and is the cascade input for next stage for the first and the intermediate stages. The same circuit can be used for other gate functions by applying Demorgan's Laws, inverting inputs to LUT where necessary and absorbing the final inverters in other LUT's or IO blocks where possible. This implementation reduces the number of outputs but introduces delay in LUT output path. The cascade circuitry is not very versatile and is difficult to use for implementing other two-input functions.