The present invention generally relates to mapping combinational logic to circuit elements of a programmable logic device, and more particularly, mapping combinational logic to lookup tables (LUTs) of multiple sizes.
Field programmable gate arrays (FPGAs), first introduced by XILINX in 1985, are becoming increasingly popular devices for use in electronics systems. For example, communications systems employ FPGAs in large measure for their re-programmability. In general, the use of FPGAs continues to grow at a rapid rate because FPGAs permit relatively short design cycles, reduce costs through logic consolidation, and offer flexibility in their re-programmability. The capabilities of and specifications for XILINX FPGAs are set forth in xe2x80x9cThe Programmable Logic Data Book,xe2x80x9d published in 1998 by XILINX, Inc., the contents of which is incorporated herein by reference.
Some types of FPGAs are implemented with a network of programmable logic blocks that include lookup tables (LUTs). A LUT is used to implement a user-programmed logic function of the LUT inputs, and the number of inputs for a particular LUT depends upon the FPGA architecture.
Mapping software converts a user""s combinational logic circuit into a network of LUTs for an FPGA implementation. Examples of such mapping software include Chortle-crf and FlowMap. The Chortle software is described in xe2x80x9cChortle-crf: Fast Technology Mapping for Lookup Table-Based FPGAsxe2x80x9d by Robert Francis, Jonathan Rose, and Zvonko Vranesic, proceedings of the Design Automation Conference 1991, pp 227-223. The FlowMap software is described in xe2x80x9cFlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designsxe2x80x9d by J. Cong and Y. Ding, IEEE Transactions on CAD, February 1994, vol. 13, No. 1, pp 1-12. Mapping software typically attempts to optimize area, delay, or a combination of area and delay. Area is modeled as the total number of LUTs required for the mapping, and delay is typically modeled in terms of the number of levels of logic in the critical path from an input of the circuit to an output of the circuit that is determined to be the critical path. Chortle-crf attempts to produce a mapping having an optimized delay and area, and FlowMap attempts to produce a mapping having an optimized delay.
The latest FPGAs have lookup tables of multiple sizes. Researchers have described algorithms for mapping to LUTs of multiple sizes while minimizing the delay. One such algorithm is described in a paper by Jianshe He and Jonathan Rose entitled xe2x80x9cTechnology Mapping for Heterogeneous FPGAsxe2x80x9d (1994 ACM International Conference on FPGAs). Another mapping software package described by Jason Cong and Songjie Xu for mapping a network to multiple sizes of LUTs entitled xe2x80x9cDelay-Optimal Technology Mapping for FPGAs with Heterogeneous LUTs,xe2x80x9d presented at the 1998 Design Automation Conference improves upon the process by He and Rose. The process by He and Rose can take an indefinite amount of time. Cong and Xu state that their process can be performed in an amount of time proportional to   (            ∑              i        =        1            C        ⁢          xe2x80x83        ⁢                  K        i            *      n      *      m      *              log        2            ⁢      n        )
where c is the number of LUT sizes, Ki is the number of LUT inputs for LUTi, n is the number of nodes, and m is the number of edges, and is thus an improvement over He and Rose. While the present mapping solutions appear to achieve the objective of mapping a network to multiple size LUTs, the computational complexity may prove to be a hindrance as FPGAs are used to implement increasingly larger networks.
A method that address the aforementioned problems, as well as other related problems, is therefore desirable.
In various embodiments, the invention comprises processes for mapping logic nodes to a plurality of sizes of lookup tables in a programmable gate array.
In one embodiment, a node and its predecessor nodes are selectively collapsed into a first single node as a function of delay factors associated with the plurality of sizes of lookup tables and a maximum of delay factors associated with the predecessor nodes. The term xe2x80x9ccollapsexe2x80x9d is used here to mean that a plurality of original nodes and their input and output signals (where the output signal of one original node is an input signal to another original node) are replaced by a single node plus all the input signals and output signals except those that connect the original nodes. If a cut-size (number of inputs) associated with the first single node is less than or equal to the number of inputs to one of the sizes of lookup tables, the one size is selected to implement the first single node. If a lookup table size was not matched by the first single node, the node and its predecessor nodes are selectively collapsed into a second single node as a function of the delay factors, and the maximum delay factor is increased by a selected value. If a cut-size associated with the second single nodes is less than or equal to the number of inputs to one of the sizes of lookup tables, the one size is selected to implement the second single node.
In another embodiment, each of the plurality of sizes of lookup tables has an associated delay factor. The process comprises:
(a) initializing a counter to a selected value;
(b) selecting one of the sizes of lookup tables;
(c) collapsing into a single node the logic node and those of the predecessor logic nodes having delay factors greater than a maximum of delay factors associated with the predecessor logic nodes plus the counter value minus the delay factor of the one size lookup table;
(d) if the single node has an associated cut-size that is less than or equal to the number of inputs to the one size lookup table, mapping to the one size lookup table the logic nodes that have been collapsed into the single node and that are within a cut of the single node;
(e) if the associated cut-size of the single node is greater than the one size lookup table, selecting another one of the sizes of lookup tables to use as the one size;
(f) repeating steps c through e until the logic node is mapped or all the sizes of lookup tables have been considered in mapping;
(g) if the logic node has not been mapped to one of the sizes of lookup tables and all the sizes of lookup tables have been considered in mapping, incrementing the counter; and
(h) repeating steps b through g until the counter value equals a least of delay factors of the sizes of lookup tables.
In yet another embodiment, a process for mapping a logic node and its predecessor logic nodes to one of a plurality of sizes of lookup tables in a programmable gate array comprises:
(a) initializing a collapse factor as a function of a maximum of respective delay factors associated with the predecessor logic nodes, wherein the collapse factor is greater than the maximum of the delay factors of the predecessor logic nodes;
(b) selecting one of the sizes of lookup tables;
(c) collapsing into a single node the logic node and the ones of the predecessor logic nodes having delay factors greater than the collapse factor minus the delay factor of the one size lookup table;
(d) if the single node has an associated cut-size that is less than or equal to the number of inputs to the one size lookup table, mapping to the one size lookup table the logic nodes collapsed into the single node that are within a cut of the single node;
(e) if the associated cut-size of the single node is greater than the one size lookup table, selecting another one of the sizes of lookup tables to use as the one size;
(f) repeating steps c through e until the logic node is mapped or all the sizes of lookup tables have been considered in mapping.
In another embodiment, a process for mapping comprises:
(a) arranging the nodes to be processed in topological order such that all fan-ins for a node are processed before the node is processed;
(b) getting the next node to be processed;
(c) selecting the smallest size LUT, along with its LUT delay and cut size;
(d) collapsing into a single node the node in process and all predecessor nodes having a delay greater than the maximum delay of the predecessor nodes minus the LUT delay of the selected LUT (so that collapsing reduces delay);
(e) if the network formed by collapsing these nodes has a cut size less than or equal to the number of inputs of the selected LUT, mapping the collapsed node into the LUT and assigning the LUT delay to the collapsed node;
(f) if the cut sizes don""t match and there are more LUT sizes available, selecting a larger LUT size (with new LUT delay and cut size);
(g) repeating steps d, e, and f until all LUT sizes have been tried or the node is mapped;
(h) if mapping did not occur for any of the LUT sizes when all those predecessor nodes having delay greater than MAXDELAY minus LUTxDELAY were collapsed into a single node, then increase delay by a factor i and repeat steps c through g until the factor i is equal to delay of the fastest LUT;
(i) selecting the fastest LUT and assigning the node to the selected LUT; and
(j) repeating the process for all nodes.
The above summary of the present invention is not intended to describe each disclosed embodiment of the present invention. The figures and detailed description that follow provide additional example embodiments and aspects of the present invention.