This invention relates to programmable integrated circuit devices. More specifically, the present invention relates to field programmable gate arrays (FPGAs).
An FPGA is a type of programmable logic device (PLD) that can be configured to perform various logic functions. An FPGA includes an array of configurable logic blocks (CLBs) connectable via programmable interconnect structures. For example, a first FPGA, invented by Freeman, is described in U.S. Pat. no. RE34,363. CLBs and interconnect structures in FPGAs are shown in U.S. Pat. No. 5,889,411 issued to Chaudhary et al. and pages 4-32 through 4-37 of the Xilinx 1996 Data Book entitled xe2x80x9cThe Programmable Logic Data Bookxe2x80x9d available from Xilinx, Inc., 2100 Logic Drive, San Jose, Calif. 95124. The Freeman reference, the Chaudhary reference, and the Data Book are incorporated herein by reference.
In addition to the structures discussed above, FPGAs also include structures for performing special functions. In particular, FPGAs include carry circuits and lines for connecting the carry output of one bit generated in one CLB to the carry input of another CLB, and cascade lines for allowing wide functions to be generated by combining several adjacent CLBs. Carry structures are discussed by Hsieh et al. in U.S. Pat. No. 5,267,187 and by New in U.S. Pat. No. 5,349,250.
Cascade structures are discussed by Goetting et al in U.S. Pat. No. 5,365,125 and by Chiang et al. in U.S. Pat. No. 5,357,153. These patents are also incorporated herein by reference.
As discussed by the above-incorporated references, each CLB may include one or more slices (xe2x80x9cslicexe2x80x9d or xe2x80x9cCLB slicexe2x80x9d). Each slice, in turn, includes at least one configurable function generator. The configurable function generator is typically implemented as a four-input lookup table (LUT). The incorporated references also point out that the carry circuits and cascade structures increase the speed at which the FPGA can perform certain functions, such as arithmetic functions.
FIG. 1A is a simplified block diagram of a conventional CLB 100. The illustrated CLB 100 includes a first slice 110 and a second slice 120. First slice 110 includes a first function generator G 112, a second function generator F 114, a third function generator 116, and an output control block 118. Output control block 118 may include multiplexers, flip-flops, or both. Four independent input terminals are provided to each of the G and F function generators 112 and 114. A single input terminal C1-in is provided to third function generator C1116. Each of function generators 112 and 114 is typically implemented as a four-input LUT, and is capable of implementing any arbitrarily defined Boolean function of the inputs signals. Each of the input terminals may be assigned a number or a letter and referred to as a xe2x80x9cliteral.xe2x80x9d For example, in CLB 100, function generator 112 receives four input signals, or literals, G1, G2, G3, and G4. Function generator 116, typically implemented as a set of configurable multiplexers, is often used to handle carry bits, but can implement some Boolean functions of its three input signals C1-in, Gxe2x80x2, and Fxe2x80x2. These Boolean functions include bypass, inverter, 2-input AND (product), and 2-input OR (sum). Signals Gxe2x80x2, Fxe2x80x2, and C1-out are multiplexed through output control block 118.
Output control block 118 provides output signal lines Y, QY, X, and QX. For this reason, output control block 118 may also be referred to as the xe2x80x9coutput multiplexerxe2x80x9d or xe2x80x9coutput select multiplexer.xe2x80x9d Slice 110 may also provide the carry out signal, C1-out. Second slice 120 is similar to first slice 110. Accordingly, operations of second slice 120 are similar to the operations of first slice 110.
Accordingly, operations of second slice 120 are similar to the operations of first slice 110.
Operation of CLB 100 is also described by the incorporated references, and, in particular, in chapters seven and eight of the above-incorporated Data Book. For simplicity, CLB 100 of FIG. 1 is illustrated with two slices; however, the number of slices constituting a CLB is not limited to two.
FIG. 1B is a simplified block diagram of another conventional CLB 100a. CLB 100a is similar to CLB 100 of FIG. 1A but has an additional LUT 113. LUT 113 takes outputs of LUT 112 and 114 as well as another input K1 to slice 110a. Thus, LUT 113 allows slice 110a to implement any arbitrarily defined Boolean function of nine literals G1, G2, G3, G4, F1, F2, F3, F4, and K1. CLB 110a may include additional slices represented by ellipses 120a. 
Technology mapping for LUT-based FPGAs involves decomposition of a circuit into combinational logic having nodes with 4-input (xe2x80x9cfan-inxe2x80x9d) functions that can be realized in the LUTs of CLB slices. This is because, as shown in slice 110, the slices commonly include 4-input LUTs as their function generators. By conventionally specifying the functions of function generators F, G, and C1, and output control block 118, slice 110 can be programmed to implement various functions including, without limitation, two independent functions of up to four variables each.
Circuit designs are mapped to FPGAs as combinational logic. The combinational logic may be expressed in Boolean expressions including a number of logic levels and routing between the logic levels. The Boolean expressions include product (logical AND) and sum (logical OR) operations. Two levels of combinational logic may be expressed using sum-of-products (SOP) format. In fact, given a set of inputs and their inverse, any logic equation can be expressed using the SOP format.
In the FPGA art, there is a continuing challenge to increase speed (performance) of FPGA-implemented functions, or circuits. Circuit performance, or speed, is increased when circuit delay is decreased. Circuit delay includes two main components: logic delay and routing delay.
Using logical axioms and Boolean algebraic rules, it is possible to partially collapse a circuit design to reduce the number of logic levels, thus reducing the routing delay.
However, this creates wide fan-in nodes. The wide fan-in nodes require use of several levels of LUTs for implementation. This is because, as described above, the LUTs have limited fan-in, for example fan-in of four. Therefore, to implement wide fan-in nodes, multiple levels of CLBs must be used. The requirement to use multiple levels of CLBs increases the logic delay as well as creating other routing delays. These negative effects cancel out the benefits from the routing delay reduction provided by the partial collapse of the circuit design.
Accordingly, there is a need for a method to implement wide fan-in nodes in FPGAs while avoiding the negative effects described above. Additionally, there is a need for CLB and CLB slice designs that allow for fast implementation of wide fan-in SOP functions.
According to one aspect of the present invention, there is provided a literal-sharing decomposition method for combinational logic circuits expressed as a sum of product terms. A first product term (P1) is combined with a second product term (P2) resulting in a product chain P1+P2 if P1 may be implemented in a number of configurable logic block (CLB) slices and the product chain P1+P2 may be implemented on the same number of configurable logic block (CLB) slices. The product chain is then used to configure CLB slices to implement the product terms. Because the product terms are combined, they can be implemented using fewer CLB slices than the number of slices needed for separate implementation. The reduction in the number of slices leads to faster implementation.
A xe2x80x9cproduct chainxe2x80x9d is a combination of product terms (xe2x80x9cPtermsxe2x80x9d) that share one or more literals. A product chain would typically include at least two Pterms; however, a single Pterm may be designated as a product chain to which other Pterms may be combined. A Pterm or a product chain may be implemented on one or more CLB slices. A xe2x80x9cslice chainxe2x80x9d is one or more slices configured to implement a Pterm or a product chain.
The first step in the literal-sharing decomposition method is to identify the Pterm having the highest number of literals and defining it as a product chain. Second, from the remaining Pterms, the Pterm having the highest number of literals is selected. Third, if the selected Pterm fits any of the product chains, then the selected Pterm is combined with one of the product chains. If a fit is not found, then the selected Pterm becomes another product chain. Finally, the second and the third steps are repeated for the remaining Pterms until all Pterms have been examined.
Any sum-of-products (SOP) function can be represented using a xe2x80x9cpersonality matrixxe2x80x9d that expresses the logical behavior, or xe2x80x9cpersonality,xe2x80x9d of the circuit. One embodiment of the literal-sharing decomposition process uses personality matrices to simplify the decomposition process. First, a personality matrix is formed for the combinational logic, the personality matrix having rows, each row representing a product term and showing the literals for the product term of that row. The rows are sorted in descending order based on the number of literals in each row.
The first row in the sorted personality matrix is defined as a product chain. Then, each row is analyzed as follows: (1) the following row is designated a current row; (2) a determination is made as to whether the current row fits into any product chain; (3) if the current row does not fit into any product chain, then the current row is designated as a new product chain; and (4) if the current row fits into an existing product chain, then the current row is combined into the existing product chain with the best fit.
According to a second aspect of the present invention, a technology mapping system is disclosed. The system has a processor and memory connected to the processor. The memory stores programs to instruct the processor to decompose combinational logic circuit expressed in sum-of-products format. The decomposition process is similar to the processes summarized in the preceding paragraphs and disclosed in detail in the following sections.
According to a third aspect of the invention, an article of manufacture for a computer is disclosed. The article may be a machine-readable storage device, such as computer memory, adaptable to hold a program for a processor. The program, when executed, causes the computer to perform the literal-sharing decomposition steps summarized in the preceding paragraphs and disclosed in detail in the following sections.
According to a fourth aspect of the invention a programmable logic device (PLD) is configured to implement a combinational logic circuit mapped to the PLD in accordance with the literal-sharing decomposition steps summarized in the preceding paragraphs and disclosed in detail in the following sections.
According to a fifth aspect of the invention, a CLB has two or more slices, each slice having an output. The CLB also includes a second-level circuit for combining the outputs from the slices.
According to a sixth aspect of the invention, a CLB has at least one slice. The slice has at least two configurable function generators receiving a plurality of inputs and generating, together, a first output. The slice also includes a combining gate for combining the first output with a combining gate input to generate a combining gate output wherein the combining gate input is an input to the first CLB slice and wherein combining gate output is an output of the first CLB slice.
According to a seventh aspect of the invention, a CLB has at least one slice. The slice has a first configurable function generator generating a first output, a second configurable function generator generating a second output, and a dedicated function generator for receiving the first output and the second output to generate a dedicated output. The dedicated function generator includes a first logic gate with an output, a second logic gate with an output, and a mutiplexer allowing selection between the two logic gate outputs.
According to an eighth aspect of the invention, a CLB has two or more slices. Each of the slices has a first configurable function generator generating a first output, a second configurable function generator generating a second output, and a dedicated function generator for receiving the first output and the second output to generate a dedicated output. The dedicated function generator includes a first logic gate and a second logic gate. The CLB also has a second-level circuit for combining the dedicated outputs from its slices.
Other aspects and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.