1. Field of the Invention
The invention pertains to the field of integrated circuits. More particularly, the invention pertains to field programmable gate array integrated circuit devices.
2. The Prior Art
Field Programmable Gate Array (FPGA) integrated circuit devices are known in the art. An FPGA comprises any number of initially uncommitted functional blocks arranged in an array along with an appropriate amount of initially uncommitted routing resources. Functional blocks are circuits which can be configured to perform a variety of logic functions like, for example, AND-gates, OR-gates, NAND-gates, NOR-gates, XOR-gates, XNOR-gates, inverters, multiplexers, adders, latches, and flip/flops. Routing resources can include a mix of components such as wires, switches, multiplexers, and buffers. Logic modules, routing resources, and other features like, for example, I/O buffers and memory blocks, are the programmable elements of the FPGA.
The programmable elements have associated control elements (sometimes known as programming bits or configuration bits) which determine their functionality. The control elements may be thought of as binary bits having values such as on/off, conductive/non-conductive, true/false, or logic-1/logic-0 depending on the context. The control elements vary according to the technology employed and their mode of data storage may be either volatile or non-volatile. Volatile control elements, such as SRAM bits, lose their programming data when the PLD power supply is disconnected, disabled or turned off. Non-volatile control elements, such as antifuses and floating gate transistors, do not lose their programming data when the PLD power supply is removed. Some control elements, such as antifuses, can be programmed only one time and cannot be erased. Other control elements, such as SRAM bits and floating gate transistors, can have their programming data erased and may be reprogrammed many times. The detailed circuit implementation of the functional blocks and routing resources can vary greatly and is appropriate for the type of control element used.
Typically a user creates a logic design inside manufacturer-supplied design software. The design software then takes the completed design and converts it into the appropriate mix of configured logic modules and other programmable elements, maps them into physical locations inside the FPGA, configures the interconnect to route the signals from one logic module to another, and generates the data structure necessary to assign values to the various control elements inside the FPGA.
Many FPGA architectures employing various different functional blocks and interconnect arrangements are known in the art. Some architectures are flat while others are clustered. In a flat architecture, the logic modules may or may not be grouped together with other logic modules, but all of the logic modules have similar or nearly equivalent access to the larger routing architecture. In a clustered architecture, the logic modules are grouped together into clusters, meaning that all of the logic modules in the cluster have some degree of exclusive routing interrelationship between themselves relative to logic modules in other clusters.
FIG. 1 illustrates a block diagram of a prior art logic cluster 100 which illustrates some basic principles of a clustered architecture. The illustrative logic cluster contains four functional blocks 102, though any number can be present. Typically a functional block 102 comprises a logic function generator circuit (or LFG) 104 and an associated sequential element 106 (designated SE in the diagram) which is typically a flip/flop that can also be configured to be a latch. The ones in the diagram have four logic input interconnects since in practice a look-up table (LUT) with four inputs (LUT4) is the most common function generator in this sort of architecture. The output of each LFG is coupled to the data input of the associated sequential element. The output of each logic function generator and each sequential element is coupled to a functional block output. The output coupled to the function generator is a combinational output while the output coupled to the sequential element is a sequential output.
Functional block 102 is a very simple exemplary functional block. Many others are known in the art. Typically there are additional routing multiplexers present in the functional block 102 to allow one of the LFG 104 inputs to bypass the LFG 104 and enter the data input of the sequential element 104. Sometimes there is a fifth input that can bypass LFG 104 without stealing one of its inputs. Often the output of the sequential element 106 can be fed back into one of the LFG 104 inputs through another routing multiplexer. Support circuits for other functions like binary arithmetic or cascading LFGs into wide functions are present. While the place and route tools in the design software will utilize these features when available, these are generally special cases separate from the general cluster routing functionality.
Typically the sequential element is more complicated than the simple D-flip/flop shown in FIG. 1. Often there will be, for example, a set signal, a reset signal, an enable signal, a load signal, and a clock signal (shown but without its source) or some combination thereof present. Collectively these signals are called sequential control inputs or global inputs (since they are often driven by signals distributed globally around the FPGA) and they typically have their own associated routing resources that may or may not be coupled to the cluster routing resources shown in FIG. 1.
The box designated Cluster Internal Routing (CIR) 110 contains the cluster internal interconnects and the box designated External Horizontal & Vertical Routing (EHVR) 114 contains the external interconnects and other routing resources of the larger FPGA (not shown). Thus in FIG. 1, CIR 100 contains only routing wires, while EHVR 114 contains a variety of different elements like switches, multiplexers, and buffers in addition to routing wires. Functional block output interconnects are cluster output interconnects if they connect to EHVR 114. If a functional block output interconnect connects to the CIR 110, it is called a feedback interconnect since it allows cluster outputs to feed back to inputs of LFGs in the same cluster without leaving and reentering the cluster by means of cluster outputs and external interconnect and routing resources. A functional block output interconnect can be both a cluster output interconnect and a feedback interconnect if it connects to both the CIR 110 and EHVR 114.
In FIG. 1, the Logic Function Generator Input Multiplexers (some exemplary instances being labeled 108) are coupled between the Cluster Internal Routing 110 and the various logic inputs coupled to the functional blocks 102. Since there are four functional blocks each with four input interconnects, there are a total of sixteen Logic Function Generator Input Multiplexers in the exemplary cluster 100. Typically, the number of data channels (or data inputs) on each Logic Function Generator Input Multiplexer is less than the total number of lines in CIR 110, so each Logic Function Generator Input Multiplexer can only route a subset of the signals inside CIR 110 to its associated LFG input.
In the architecture 100 of FIG. 1, signals are transmitted from the EHVR 114 to the CIR 110 by ten Cluster Input Multiplexers, two exemplary ones being labeled 112. Various interconnects and resources from other parts of the FPGA are connected to the inputs of the Cluster Input Multiplexers by means of the External Horizontal & Vertical Routing 114. The lines internal to the Cluster Internal Routing box 110 come from a variety of sources: the outputs of the Cluster Input Multiplexers, the outputs of the cluster's LFGs and SEs, and possibly other sources such as clock networks and other special functions not shown in FIG. 1 to avoid overcomplicating the diagram.
As FPGAs get larger, clustered architectures get favored over completely flat ones, based on the ease of place and route, and how fast this task can be accomplished by the design software. There are many examples of clustered architectures in both the academic literature as well as in commercial products.
FIG. 2A shows an exemplary cluster 200 of a type known in the art. Present in FIG. 2A are functional blocks 202, Level 1 multiplexers 204a through 204j, EHVR 206, cluster input interconnect busses 208a through 208j, functional block output bus 210, and feedback interconnects 212. There can be any number of functional blocks of various types inside of box 202 as a matter of design choice. This is an abstraction intended to focus attention on the relationships between classes of interconnects inside cluster 200 rather than on the detailed connections of a specific circuit topology.
The inputs of the functional blocks 202 are coupled to the outputs of the first level (or level 1) multiplexers 204a through 204j. While ten level 1 multiplexers 204a through 204j are shown in FIG. 2A, the exact number present is equal to the total of all of the inputs of all of the functional blocks 202. The number of input channels of the multiplexers 204a through 204h need not be equal. Similarly, the portion of the input channels into each multiplexer from each of the sources shown need not be equal.
The external horizontal and vertical routing (EHVR) 206 contains routing interconnects and other routing resources such as, for example, multiplexers, buffers, and control elements for programming and enabling them. Placing the balance of the FPGA routing in box 206 is a deliberate abstraction to allow focusing on the relationships of classes of interconnects inside cluster 200.
The level 1 multiplexers 204a through 204j are coupled to EHVR 206 by cluster input interconnect busses 208a through 208j. While each cluster input interconnect bus 208a through 208j is shown with exactly five lines in FIG. 2A, this is an abstraction and any number of interconnects may be present as a matter of design choice.
The outputs of functional blocks 202 form functional block output bus 210, which is shown coupling functional blocks 202 to EHVR 206 and feedback interconnects 212. The portion of the interconnects in functional block output bus 210 that couple to either EHVR 206 or feedback interconnects 212 is a matter of design choice.
Feedback interconnects 212 are coupled to inputs on level 1 multiplexers 204a through 204j by interconnect busses 214a through 214j (though only 214j is labeled in FIG. 2A to avoid cluttering and obscuring the drawing figure). The feedback interconnects are functional block outputs that can be routed back to the functional block inputs without exiting and reentering cluster 200 through EHVR 206, giving the feedback interconnects a significantly faster return path. Level 1 multiplexer 204a is coupled to interconnect bus 214a, level 1 multiplexer 204b is coupled to interconnect bus 214b, and so on through Level 1 multiplexer 204j and interconnect bus 214j. Although five wires are shown in each of interconnect busses 214a through 214j the exact number of wires present can vary from one bus to another and is a matter of design choice.
While interconnect busses 208a through 208j couple EHVR 206 to the level 1 multiplexers 204a through 204j, they do not connect to the feedback interconnects 212. In FIG. 2A, they can be thought of as “passing under” them instead. One example is indicated by an oval designated “Bus Passes Under, Does Not Connect.” This convention will be used with respect to various interconnect representations throughout this application, since drawing such busses in the more conventional “passing over” style makes the drawing figures harder to read and obscures the concepts being illustrated.
In FIG. 2A, EHVR 206 contains the external interconnects and other routing resources of the larger FPGA (not shown) while feedback interconnects 212 contains only routing wires.
Cluster 200 can be described as an I1F1 type cluster. The “I1” signifies that inputs to the cluster enter the cluster internal routing resources at the first level of multiplexers while the “F1” signifies that the feedback interconnects also enter the cluster internal routing resources at the level 1 multiplexers. This type of shorthand description is useful for characterizing many types of clustered architectures. Examples of I1F1 clusters are found in a number of different commercial FPGA families offered by Xilinx, Inc., of San Jose, Calif.
An I2F1 cluster 220 of the prior art is shown in FIG. 2B. Present in the drawing figure are functional blocks 222, level 1 multiplexers 224a through 224j, local interconnects 226, level 2 multiplexers 228a through 228j, EHVR 230, interconnect busses 232a through 232j, interconnect busses 234a through 234j (though only 234j is labeled in the figure), functional block output bus 236, and feedback interconnects 238.
Any number of functional blocks may be present inside box 222 as a matter of design choice. Although ten level 1 multiplexers are shown, the number of level 1 multiplexers is equal to the total number of functional block inputs and the number of data channels on the various level 1 multiplexers can vary from one to another as a matter of design choice. While each level 1 multiplexer is shown coupled to two busses of five interconnects each, the number of interconnects in each bus can vary as a matter of design choice. Similarly, while ten level 2 multiplexers 228a through 228j are shown, the exact number will vary from architecture to architecture as a matter of design choice.
The data flow for external signals is interconnects originating in EHVR 230 are coupled to the inputs of the second level multiplexers 228a through 228j. While seven interconnects are shown for each level 2 multiplexer in the drawing figure, the exact number can vary from multiplexer to multiplexer as a matter of design choice. The outputs of the level 2 multiplexers are coupled to the level 1 interconnections 226, which in turn are coupled to some of the data channels on the level 1 multiplexers, which in turn have their outputs coupled to the inputs of the functional blocks 222. Thus the cluster inputs enter the internal cluster routing resources (meaning multiplexers and interconnects) at the level 2 multiplexers (I2 in I2F1). The box labeled level 1 interconnects 226 in FIG. 2B contains only routing wires and all of the routing decisions are implemented by control elements (not shown) governing the level 1 multiplexers 224a through 224j and level 2 multiplexers 228a through 228j. 
The data flow for feedback signals starts from the outputs of the functional blocks 222 that form functional block output bus 236. Some of the interconnects in bus 236 are coupled to the feedback interconnects 238 which in turn are coupled to some of the data channels on the level 1 multiplexers. Thus the feedback interconnects enter the internal cluster routing resources at the level 1 multiplexers (the F1 in I2F1). The box labeled feedback interconnects 238 in the drawing figure contain only routing wires and routing decisions internal to cluster 220 with respect to these wires are made by the control elements governing the level 1 multiplexers 224a through 224j. Interconnects in functional block output bus 336 can be coupled to feedback interconnects 238, EHVR 230, or both. The number and destinations of the interconnects in functional block output bus 236 is a matter of design choice.
I2F1 clusters are well known in both academia and in commercial FPGAs. Such a cluster is described in the textbook Guy Lemieux and David Lewis, Design of Interconnection Networks for Programmable Logic, Kluwer Academic Publishers, 2004 (henceforth “Lemieux”), page 28, FIG. 3.4. Commercial products using this architecture can be found in a number of FPGA families offered by Altera Corporation of San Jose, Calif.
Other FPGA cluster types with two levels of routing multiplexers are possible. In U.S. Pat. No. 6,975,139 to Pani, et al, (henceforth “Pani”) another architecture is shown in FIG. 7. In that figure, one set of input wires 601-604 is shown coupled to first level multiplexers while another set of input wires 605-612 is shown coupled to second level multiplexers. Since no feedback interconnects are explicitly shown in conjunction with the figure, it is reasonable to assume they are included in the input wires 601-612. The input interconnects could reasonably be coupled exclusively to the second level multiplexers (I2) or to a combination of first and second level multiplexers (I2—pronounced “i-one-two,” not “i-twelve”). Similarly, the feedback interconnects could reasonably be coupled exclusively to first level multiplexers (F1), exclusively to second level multiplexers (F2), or to a combination of first and second level multiplexers (F12—pronounced “ef-one-two,” not “ef-twelve”). Thus the cluster combinations taught in Pani FIG. 7 could reasonably be inferred to be I12F1, I12F2, I12F12, I2F1 or I2F12 depending on the particular design choices made in assigning input and feedback connections in any given embodiment.
FPGA cluster types of three or more stages are also known in the art. In Pani, FIGS. 9-12 an I5F5 cluster is disclosed. In Lemieux, Chapter 2, Section 2.1, pages 9-17, highly routable switching networks are discussed in general, including references to a number of well known switching networks such as Clos networks and Benes networks. These networks typically have at least three stages of switches and can often be optimized for decreased numbers of switches and improved routablility by increasing the number of levels of switches that signals must pass through. Unfortunately, when such an approach is used in an FPGA cluster, the resulting performance degradation is undesirable to the point of being unacceptable in commercial FPGAs.
Since switches are typically constructed out of pass transistors or analogous floating gate structures in FPGAs, a series of switches and interconnects creates an RC network. As the number of series switches increases, both the resistance and capacitance also increase which greatly increases the propagation delay through the network. Adding buffers can help, but often the propagation delay through the buffers offsets the decrease in RC delays. At present, no three or more level clusters are used in commercially available products.
FIG. 2C shows an I3F3 type FPGA cluster 250 known in the prior art. Present in the drawing figure are functional blocks 252, level 1 multiplexers 254a through 254j, level 1 interconnects 256, level 2 multiplexers 258a through 258j, level 2 interconnects 260, level 3 multiplexers 262a through 262j, input busses 264a through 264j, EHVR 266, functional block output bus 268, and feedback interconnects 270. Similar to the clusters shown in FIG. 2A and FIG. 2B, the numbers of functional blocks, the numbers of first, second and third level multiplexers, the numbers of interconnects in the various interconnect busses, and the number of input channels on the various multiplexers are all a matter of design choice.
The data flow for external signals is through interconnects originating in EHVR 266 that are coupled to some of the data channels of the third level multiplexers 262a through 262j. The outputs of the level 3 multiplexers are coupled to the level 2 interconnections 260 which in turn are coupled to the data channels on the level 2 multiplexers 258a through 258j. The outputs of the level 2 multiplexers 258a through 258j are coupled to the level 1 interconnects 256 which are coupled to the data channels of the level 1 multiplexers 254a through 254j, which in turn have their outputs coupled to the inputs of the functional blocks 252. Thus the cluster inputs enter the internal cluster routing resources at the level 3 multiplexers (the I3 in I3F3).
The data flow for feedback signals starts from the outputs of the functional blocks 252 that form functional block output bus 268. Some or all of the interconnects in bus 268 (the number and destinations of the interconnects in functional block output bus 268 is a matter of design choice) are coupled to the feedback interconnects 270 which in turn are coupled to some of the data channels on the level 3 multiplexers. The outputs of the level 3 multiplexers are coupled to the level 2 interconnections 260 which in turn are coupled to the data channels on the level 2 multiplexers 258a through 258j. The outputs of the level 2 multiplexers 258a through 258j are coupled to the level 1 interconnects 256 which are coupled to the data channels of the level 1 multiplexers 254a through 254j, which in turn have their outputs coupled to the inputs of the functional blocks 252. Thus the feedback interconnects reenter the internal cluster routing resources at the level 3 multiplexers (the F3 in I3F3).
Level 1 interconnects 256, level 2 interconnects 260 and feedback interconnects 270 contain only routing wires. All of the routing decisions are implemented by control elements (not shown) governing the level 1 multiplexers 254a through 254j, level 2 multiplexers 258a through 258j and level 3 multiplexers 262a through 262j. 
Turning to FIG. 3, a typical lookup table (or LUT) functional block 300 of the prior art is shown. LUT 300 has four inputs A, B, C and D and an output OUT and when programmed is capable of generating any four variable Boolean function where OUT=f(A,B,C,D) where “f” is the programmed Boolean function. Such a four input lookup table has been used as the LFG in many academic studies and commercial FPGAs. Typically a buffer or a sense amplifier (not shown) is present at the output to provide output drive.
LUT 300 comprises one first stage 2:1 multiplexer 302, two second stage 2:1 multiplexers 304a through 304b, four third stage 2:1 multiplexers 306a through 306d, eight fourth stage 2:1 multiplexers 308a through 308h, and sixteen control elements 310a through 310p. The select inputs of the first, second, third and fourth stage 2:1 multiplexers are coupled to inputs A, B, C, and D respectively. The output of first stage 2:1 multiplexer 302 is coupled to OUT while its two data channels are each coupled to one of the outputs of the two second stage 2:1 multiplexers 304a through 304b. The four data channels of the two second stage 2:1 multiplexers 304a through 304b are each coupled to one of the four outputs of third stage 2:1 multiplexers 306a through 306d. The eight data channels of the four third stage 2:1 multiplexers 306a through 304d are each coupled to one of the eight outputs of fourth stage 2:1 multiplexers 308a through 308h. The sixteen data channels of the eight fourth stage 2:1 multiplexers 308a through 308h are each coupled to one of the sixteen outputs of the control elements 310a through 310p. Collectively, this arrangement of 2:1 multiplexers forms a 16:1 multiplexer that presents the output of one of the sixteen control elements 310a through 310p to the output OUT.
Control elements 310a through 310p produce either a logic-0 or a logic-1 depending on how they are programmed. Boolean function f(A,B,C,D) of four variables will have sixteen unique input states, each with a corresponding output state. The value of that output state is placed in the control elements 310a through 310p such that the correct value is gated to the output OUT for the logic values presented at the inputs A, B, C and D. Thus the circuit 300 is said to “look up” the logic function rather than calculate it.
The nature of the control elements 310a through 310p will vary with the technology employed in constructing the FPGA. For example, in an SRAM-based FPGA each of the control elements 310a through 310p will typically be one bit of SRAM, with or without a buffer. Or in an antifuse based FPGA, each of the control elements would typically comprise a first antifuse coupled between the control element output and VCC and a second antifuse coupled between the control element output and ground. Or in a flash based FPGA, each of the control elements would typically comprise a first floating gate transistor coupled between the control element output and VCC and a second floating gate transistor coupled between the control element output and ground. In addition, some programming signals and circuitry may also be present in each control element.
It is worth noting that the propagation delay through LUT 300 of FIG. 3 will be different for each of the four inputs A, B, C and D. Input A will be the fastest, having the least propagation delay, since a logic transition on input A will only need to propagate through first stage multiplexer 302 to reach the output OUT. Input B will be the second fastest, having the second least propagation delay, since a logic transition on input B will need to propagate through one of the second stage multiplexers 304a through 304b as well as through first stage multiplexer 302 to reach the output OUT. In a similar manner, input C will be the third fastest, having the third least propagation delay, since a logic transition on input C will need to propagate through one of the third stage multiplexers 306a through 306d, through one of the second stage multiplexers 304a through 304b, as well as through first stage multiplexer 302 to reach the output OUT. Similarly, input D will be the slowest input, since an input transition on D will need to propagate through a multiplexer in each of the four stages.
Using circuit design techniques well known in the art such as adding buffers and inverters, using different styles of multiplexer, sizing transistors, decoding the select inputs, using level restorers, etc., LUT 300 can be optimized in many different ways. For example, LUT 300 can be designed for minimum area or minimum power or minimum propagation delay as a matter of design choice. Optimizing for minimum propagation delay can be done in a number of different ways. For example, LUT 300 can be optimized so that the average of the propagation delays from each of the four inputs A, B, C and D is minimized. Alternatively, the propagation delay of the fastest input can be minimized, even if this causes the average propagation delay of the four inputs A, B, C and D to be greater than the minimum for any given area and power budget.
In U.S. Pat. No. 7,443,198 to McCollum, et al, (henceforth “McCollum”) a second look up table functional block is disclosed in FIG. 8A, FIG. 8B and FIG. 9. Since multiplexer 312 shown in FIG. 9 of McCollum is constructed in a manner similar to the first three multiplexer stages of LUT 300 in FIG. 3, the propagation delay through the McCollum look up table will be different for each of the four inputs A, B, C and D.