Programmable logic devices (PLDs) are a well-known type of integrated circuit that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (FPGA), typically includes an array of programmable tiles. These programmable tiles can include, for example, input/output blocks (IOBs), configurable logic blocks (CLBs), dedicated random access memory blocks (BRAM), multipliers, digital signal processing blocks (DSPs), processors, clock managers, delay lock loops (DLLs), and so forth.
Each programmable tile typically includes both programmable interconnect and programmable logic. The programmable interconnect typically includes a large number of interconnect lines of varying lengths interconnected by programmable interconnect points (PIPs). The programmable logic implements the logic of a user design using programmable elements that can include, for example, function generators, registers, arithmetic logic, and so forth.
The programmable interconnect and programmable logic are typically programmed by loading a stream of configuration data into internal configuration memory cells that define how the programmable elements are configured. The configuration data can be read from memory (e.g., from an external PROM) or written into the FPGA by an external device. The collective states of the individual memory cells then determine the function of the FPGA.
Another type of PLD is the Complex Programmable Logic Device, or CPLD. A CPLD includes two or more “function blocks” connected together and to input/output (I/O) resources by an interconnect switch matrix. Each function block of the CPLD includes a two-level AND/OR structure similar to those used in Programmable Logic Arrays (PLAs) and Programmable Array Logic (PAL) devices. In CPLDs, configuration data is typically stored on-chip in non-volatile memory. In some CPLDs, configuration data is stored on-chip in non-volatile memory, then downloaded to volatile memory as part of an initial configuration sequence.
For all of these programmable logic devices (PLDs), the functionality of the device is controlled by data bits provided to the device for that purpose. The data bits can be stored in volatile memory (e.g., static memory cells, as in FPGAs and some CPLDs), in non-volatile memory (e.g., FLASH memory, as in some CPLDs), or in any other type of memory cell.
Other PLDs are programmed by applying a processing layer, such as a metal layer, that programmably interconnects the various elements on the device. These PLDs are known as mask programmable devices. PLDs can also be implemented in other ways, e.g., using fuse or antifuse technology. The terms “PLD” and “programmable logic device” include but are not limited to these exemplary devices, as well as encompassing devices that are only partially programmable. For example, one type of PLD includes a combination of hard-coded transistor logic and a programmable switch fabric that programmably interconnects the hard-coded transistor logic.
FIG. 1 is a simplified illustration of an exemplary FPGA. The FPGA of FIG. 1 includes an array of configurable logic blocks (LBs 101a-101i) and programmable input/output blocks (I/Os 102a-102d). The LBs and I/O blocks are interconnected by a programmable interconnect structure that includes a large number of interconnect lines 103 interconnected by programmable interconnect points (PIPs 104, shown as small circles in FIG. 1). PIPs are often coupled into groups (e.g., group 105) that implement multiplexer circuits selecting one of several interconnect lines to provide a signal to a destination interconnect line or logic block. As noted above, some FPGAs also include additional logic blocks with special purposes (not shown), e.g., DLLs, block RAM, and so forth.
A PLD interconnect structure can be complex and highly flexible. For example, Young et al. describe the interconnect structure of an exemplary FPGA in U.S. Pat. No. 5,963,050, issued Oct. 5, 1999 and entitled “Configurable Logic Element with Fast Feedback Paths”, which is incorporated herein by reference in its entirety. Young et al. describe various types of interconnect lines, including general interconnect lines that programmably interconnect two or more different logic blocks, and fast feedback interconnect lines that interconnect lookup table (LUT) output terminals with input terminals of the same LUT and of other LUTs in the same logic block.
FIG. 2 is a block diagram of a logic block in a typical FPGA, and illustrates an exemplary fast feedback path. Logic block 200 of FIG. 2 includes an input multiplexer (IMUX) 201, two slices 202A, 202B of programmable logic driven by the input multiplexer, an output multiplexer (OMUX) 204, and three-state buffers 205, all coupled together as shown in FIG. 2. In exemplary logic block 200, each slice includes two lookup tables (LUTS) 203A-203D. LUT input signals are provided by input multiplexer 201. Several output signals from each slice, including the LUT output signals, are provided to output multiplexer 204. The LUT output signals, in addition to driving the output multiplexer, are also provided back to input multiplexer 201 via fast feedback paths 206.
For example, fast feedback path 206A is provided from the LUT output of LUT 203A back to input multiplexer 201. Within the input multiplexer, a signal on fast feedback path 206A can access any of three LUT input terminals, e.g., one input terminal of LUT 203B, one of LUT 203C, and one of LUT 203D.
Additional examples of fast feedback paths are shown, for example, in FIG. 13 of U.S. Pat. No. 5,963,050, referenced above.
Fast feedback paths provide a fast interconnection between two LUTs. Therefore, fast feedback paths can be used to reduce path delays for signal paths traversing more than one LUT between registers. FIGS. 3 and 4 illustrate a known method by which LUTs can be “packed” into slices, such that path delays are reduced by using fast feedback paths. This known packing method addresses only fast feedback paths between the two LUTs of each slice, and does not consider fast feedback paths between slices. Therefore, the packing method illustrated in FIGS. 3 and 4 assumes that the longest available fast feedback path interconnects only two LUTs.
FIGS. 3 and 4 show seven LUTs A-G, and the interconnections between these LUTs in an exemplary design to be implemented in a PLD having two LUTs per slice. The structures shown in FIGS. 3 and 4 are known as “graphs”. Each node A-G of a graph represents a LUT, and the arrows represent “edges” in the graph, in this case interconnections between the LUTs.
In the implementation shown in FIG. 3, LUT pairs A&B, C&D, and E&F are interconnected using fast feedback paths, or “fast paths” (solid arrows). LUT pairs B&D, D&E, and F&G are interconnected using feedback paths other than fast feedback paths, or “slow paths” (dashed arrows). Clearly, the longest signal path in this example traverses LUTs A, B, D, E, F, and G, and includes two fast paths and three slow paths. If the delay of a fast path is “f”, and the delay of a slow path is “s”, the longest path delay in this exemplary graph is “3s+2f” (3 times s plus 2 times f).
FIG. 4 shows an alternative implementation of the same circuit, in which improved packing reduces the overall delay of the longest signal path. In the implementation of FIG. 4, LUT pairs A&B, D&E, and F&G are interconnected using fast paths, and LUT pairs C&D, B&D, and E&F are interconnected using slow paths. Thus, the longest path delay in the exemplary graph of FIG. 4 is “2s+3f”. The delay of the longest path is less than that of FIG. 3 by a delay of “s−f” (s minus f).
This example illustrates that in the delay based packing problem, a locally best solution (e.g., packing LUTs C&D into the same slice, as in FIG. 3) can result in a sub-optimal global solution (e.g., compared to the solution of FIG. 4).
Clearly, it is desirable to provide a method of determining packing in a PLD that provides a globally desirable solution, and not just a locally desirable one. One such method currently in use utilizes linear scans of a topologically sorted Directed Acyclic Graph (DAG) (e.g., such as those in FIGS. 3 and 4) to pack LUTs into slices based on maximizing the use of fast paths. A forward estimation of arrival times plus a reverse traversal yields slack values that identify the longest paths. A visit of each node in reverse topological order and ordered by worst slack drives the packing. Note that this approach utilizes graphs similar to those shown in FIGS. 3 and 4, and approaches the packing problem from the viewpoint already presented, that of identifying the longest path and packing the nodes on the longest path to maximize the use of fast paths on this path.
However, identifying the longest path and addressing delays on that path might or might not address the needs of the PLD user. For example, the longest path is not necessarily the most critical path in a user design, because some paths have no effect on the maximum operating frequency of a design. Therefore, it is desirable to provide a method of packing a design into a PLD that addresses the overall packing problem, rather than simply addressing the longest paths in the design.