The present invention generally relates to implementing multiplexers in field programmable gate arrays (FPGAs), and more particularly to implementing large multiplexers using FPGA lookup tables.
Programmable integrated circuits (ICs) are a well-known type of integrated circuit that may be programmed by a user to perform specified logic functions. (The term xe2x80x9cprogrammable ICsxe2x80x9d as used herein includes but is not limited to FPGAs, mask programmable devices such as Application Specific ICs (ASICs), Programmable Logic Devices (PLDs), and devices in which only a portion of the logic is programmable.) One type of programmable IC, the field programmable gate array (FPGA), typically includes an array of configurable logic blocks (CLBs) surrounded by a ring of programmable input/output blocks (IOBs). The CLBs and IOBs are interconnected by a programmable interconnect structure. The CLBs, IOBs, and interconnect structure are typically programmed by loading a stream of configuration data (bitstream) into internal configuration memory cells that define how the CLBs, IOBs, and interconnect structure are configured. The configuration data may be read from memory (e.g., an external PROM) or written into the FPGA by an external device. The collective states of the individual memory cells then determine the function of the FPGA.
A CLB typically includes one or more function generators (often implemented as lookup tables, or LUTs), and one or more registers that can optionally be used to register the LUT outputs. Some CLBs also include chains of carry logic that is used to implement arithmetic functions such as adders, subtractors, counters, and multipliers. Implementing logic using these carry chains can be faster, sometimes much faster, than implementing the equivalent logic in LUTs and passing carry signals from one bit to the next through the interconnect structure. The speed of a carry chain depends on the number of bits in the carry chain and the speed of each carry bit (among other factors). The speed of the equivalent logic implemented as LUTs depends on the number of levels of logic (i.e., the number of LUTs on the slowest path) required to implement the function. Usually, using the carry chain is faster. However, using the carry chain imposes placement constraints because the ordering of portions of the user""s function is set by the carry chain.
Two forms of design entry are common: schematic entry and Hardware Description Languages (HDLs) such as Verilog and VHDL. When schematic entry is used, the designer specifies the exact implementation desired for his circuit. At a higher level, when HDL code is used, the circuit is described by its logical function. Synthesis software then translates the logical function into specific logic targeted for a specified FPGA. Although circuit elements can be manually instantiated in HDL code, this method is avoided since it is labor-intensive and the code can typically only be targeted to a specific programmable IC architecture.
Well-known synthesis tools such as those distributed by Synopsys, Inc., of Mountain View, Calif., recognize arithmetic functions in the HDL code and implement these functions using carry logic. Other functions such as wide logic gates and cascade circuits can also be implemented using carry logic. However, these other types of functions used in HDL code are not so implemented by the synthesis tools, even when the method that is used results in a much slower circuit. It would be desirable, therefore, for synthesis tools to implement logic in a manner that makes better use of the carry structure in order to minimize the delay of the circuit. Further, when implementing multiplexers, the synthesis tools may instantiate all 2n multiplexer inputs where n is the number of select inputs for the multiplexer, even when only a few of the 2n multiplexer input signals will be used. For example, HDL code often includes segments such as the following:
wire busSigA[0:11];
select on (busSigA) {
case xe2x80x98010 . . . 10xe2x80x99:
out less than =in6;
case xe2x80x9801100 . . . 0xe2x80x99:
out less than =in10;
case xe2x80x981000 . . . 10xe2x80x99:
out less than =in25;
case others:
out less than =1xe2x80x2b0;
}
The above code segment specifies that there are 12 select signals (0 through 11) and that for three combinations of these select signals, input signals in6, in10, and in25 are to be provided as output signals, otherwise logic 0 is to be provided as an output signal.
The well-known synthesis tools automatically translate the above code segment into a large multiplexer with 12 select inputs (busSigA) and 212=4096 data inputs. However, 4093 of these data inputs are logic 0.
Conventional FPGA software has simplified the above HDL construct with the following steps:
(1) Convert the 4096-input multiplexer into an AND-OR form where an AND gate decodes the select signals plus one input signal, and the AND gate outputs are applied to an OR gate.
(2) Optimize the AND-OR form, resulting in a much smaller logic network.
(3) Implement the resulting logic network in LUTs of the FPGA.
While the above simplification greatly improves efficiency of the resulting multiplexer implementation, it would be preferable to take advantage of all architectural features available in an FPGA in order to produce the smallest and fastest implementation that can be implemented in the FPGA. It would also be preferable that such an improvement be applicable to non-programmable replacement structures for FPGAs and to other IC devices having the necessary architectural features.
According to various embodiments, the present invention provides a method for implementing a wide multiplexer in a programmable integrated circuit. In a first embodiment, the method comprises detecting logic that defines a multiplexer, the logic including a plurality of selection signals and a plurality of input signals, wherein a selected combination of logic states of the selection signals selects a particular input signal. If the multiplexer has more than a threshold number of input signals, the multiplexer is implemented using pluralities of lookup tables and carry multiplexers, the pluralities of lookup tables and carry multiplexers grouped into sets of two or more lookup tables and two or more associated carry multiplexers, wherein each set implements a respective one of the combinations of logic states and is configured to receive as input the plurality of selection signals and a respective one of the input signals, each lookup table having an output terminal coupled to a select terminal of a respective one of the carry multiplexers, each carry multiplexer having an output terminal and first and second input terminals. The carry multiplexers receive a first selected logic level on first input terminals. A first one of the carry multiplexers has a second input terminal configured to receive a second selected logic level signal, a last one of the carry multiplexers has an output terminal configured to provide an output signal for the multiplexer, and the second input terminal of each carry multiplexer is coupled to the output terminal of another one of the remaining carry multiplexers. Each of the sets of lookup tables is configured to implement an AND function if the first logic level (the default) is logic zero and to implement a NOR function if the first logic level (the default) is logic one.
In another embodiment, the method for implementing a multiplexer comprises detecting logic that defines a multiplexer and simulates multiplexers implemented by two methods, decode and tree, then compares the two multiplexer structures to determine which is faster (or smaller). The decode multiplexer is implemented as discussed above. The tree multiplexer is implemented using a plurality of lookup tables, a first set of 2:1 multiplexers, and a second set of 2:1 multiplexers, each lookup table implementing a 2:1 multiplexer, and a first set of the lookup tables configured to receive as input 2 respective ones of the input signals and a first selected one of the selection signals as a selector input, the lookup tables having respective outputs, and pairs of the lookup tables having outputs coupled to inputs of respective ones of the first set of 2:1 multiplexers. A second one of the selection signals is provided as a selection input to the first set of 2:1 multiplexers, pairs of the first set of 2:1 multiplexers having outputs coupled to inputs of the second set of 2:1 multiplexers. A third one of the selection signals is provided as a selection input to the second set of 2:1 multiplexers.
The above summary of the present invention is not intended to describe each disclosed embodiment of the present invention. The figures and detailed description that follow provide additional example embodiments and aspects of the present invention.