Programmable integrated circuits (ICs) are a well-known type of integrated circuit that may be programmed by a user to perform specified logic functions. (The term "programmable ICs" as used herein includes but is not limited to FPGAs, mask programmable devices such as Application Specific ICs (ASICs), Programmable Logic Devices (PLDs), and devices in which only a portion of the logic is programmable.) One type of programmable IC, the field programmable gate array (FPGA), typically includes an array of configurable logic blocks (CLBs) surrounded by a ring of programmable input/output blocks (IOBs). The CLBs and IOBs are interconnected by a programmable interconnect structure. The CLBs, IOBs, and interconnect structure are typically programmed by loading a stream of configuration data (bitstream) into internal configuration memory cells that define how the CLBs, IOBs, and interconnect structure are configured. The configuration data may be read from memory (e.g., an external PROM) or written into the FPGA by an external device. The collective states of the individual memory cells then determine the function of the FPGA.
A CLB typically includes one or more function generators (often implemented as lookup tables, or LUTs), and one or more registers that can optionally be used to register the LUT outputs. Some CLBs also include chains of carry logic that is used to implement arithmetic functions such as adders, subtractors, counters, and multipliers. Implementing logic using these carry chains can be faster, sometimes much faster, than implementing the equivalent logic in LUTs and passing carry signals from one bit to the next through the interconnect structure. The speed of a carry chain depends on the number of bits in the carry chain and the speed of each carry bit (among other factors). The speed of the equivalent logic implemented as LUTs depends on the number of levels of logic (i.e., the number of LUTs on the slowest path) required to implement the function. Usually, using the carry chain is faster. However, using the carry chain imposes placement constraints because the ordering of portions of the user's function is set by the carry chain.
Two forms of design entry are common: schematic entry and Hardware Description Languages (HDLs) such as Verilog and VHDL. When schematic entry is used, the designer specifies the exact implementation desired for his circuit. At a higher level, when HDL code is used, the circuit is described by its logical function. Synthesis software then translates the logical function into specific logic targeted for a specified FPGA. Although circuit elements can be manually instantiated in HDL code, this method is avoided since it is labor-intensive and the code can typically only be targeted to a specific programmable IC architecture.
Well-known synthesis tools such as those distributed by Synopsys, Inc., of Mountain View, Calif., recognize arithmetic functions in the HDL code and implement these functions using carry logic. Other functions such as wide logic gates and cascade circuits can also be implemented using carry logic. However, these other types of functions used in HDL code are not so implemented by the synthesis tools, even when the method that is used results in a much slower circuit. It would be desirable, therefore, for synthesis tools to implement logic in a manner that makes better use of the carry structure in order to minimize the delay of the circuit. Further, when implementing multiplexers, the synthesis tools may instantiate all 2.sup.n multiplexer inputs where n is the number of select inputs for the multiplexer, even when only a few of the 2.sup.n multiplexer input signals will be used. For example, HDL code often includes segments such as the following:
______________________________________ wire busSigA[0:11]; select on (busSigA) { case `010 . . . 10`: out &lt;= in6; case `01100 . . . 0`: out &lt;= in10; case `1000 . . . 10`: out &lt;= in25; case others: out &lt;= 1`b0; ______________________________________
The above code segment specifies that there are 12 select signals (0 through 11) and that for three combinations of these select signals, input signals in6, in10, and in25 are to be provided as output signals, otherwise logic 0 is to be provided as an output signal.
The well-known synthesis tools automatically translate the above code segment into a large multiplexer with 12 select inputs (busSigA) and 212=4096 data inputs. However, 4093 of these data inputs are logic 0.
Conventional FPGA software has simplified the above HDL construct with the following steps:
(1) Convert the 4096-input multiplexer into an AND-OR form where an AND gate decodes the select signals plus one input signal, and the AND gate outputs are applied to an OR gate. PA1 (2) Optimize the AND-OR form, resulting in a much smaller logic network. PA1 (3) Implement the resulting logic network in LUTs of the FPGA.
While the above simplification greatly improves efficiency of the resulting multiplexer implementation, it would be preferable to take advantage of all architectural features available in an FPGA in order to produce the smallest and fastest implementation that can be implemented in the FPGA. It would also be preferable that such an improvement be applicable to non-programmable replacement structures for FPGAs and to other IC devices having the necessary architectural features.