1. Field of the Invention
The Present invention relates to integrated circuit devices. In particular, it relates to field programmable gate array integrated circuit devices.
2. The Prior Art
Field Programmable Gate Array (FPGA) integrated circuit devices are known in the art. An FPGA comprises any number of initially uncommitted logic modules arranged in an array along with an appropriate amount of initially uncommitted routing resources. Logic modules are circuits which can be configured to perform a variety of logic functions like, for example, AND-gates, OR-gates, NAND-gates, NOR-gates, XOR-gates, XNOR-gates, inverters, multiplexers, adders, latches, and flip-flops. Routing resources can include a mix of components such as wires, switches, multiplexers, and buffers. Logic modules, routing resources, and other features like, for example, I/O buffers and memory blocks, are the programmable elements of the FPGA.
The programmable elements have associated control elements (sometimes known as programming bits or configuration bits) which determine their functionality. The control elements may be thought of as binary bits having values such as on/off, conductive/non-conductive, true/false, or logic-1/logic-0 depending on the context. The control elements vary according to the technology employed and their mode of data storage may be either volatile or non-volatile. Volatile control elements, such as SRAM bits, lose their programming data when the FPGA power supply is disconnected, disabled or turned off. Non-volatile control elements, such as antifuses and floating gate transistors, do not lose their programming data when the FPGA power supply is removed. Some control elements, such as antifuses, can be programmed only one time and cannot be erased. Other control elements, such as SRAM bits and floating gate transistors, can have their programming data erased and may be reprogrammed many times. The detailed circuit implementation of the logic modules and routing resources can vary greatly and are appropriate for the type of control element used.
Typically a user creates a logic design inside manufacturer-supplied design software. The design software then takes the completed design and converts it into the appropriate mix of configured logic modules and other programmable elements, maps them into physical locations inside the FPGA, configures the interconnect to route the signals from one logic module to another, generates the data structure necessary to assign values to the various control elements inside the FPGA, and program the FPGA if a programming head interfaced to an FPGA is present in the computer system.
The design software typically manipulates the user design in a variety of different ways. For example, the Boolean functions can be manipulated to optimally convert the design to programmable elements optimizing for maximum performance, for minimum area or minimum power. If logic modules and programmable routing elements have asymmetrical propagation delays for rising delays and falling delays, the logic polarity on a given signal can be adjusted to exploit this and the inverted polarity compensated for elsewhere. Similarly, if programmable elements have different static power in different logic states, the Boolean functions can be manipulated so that the circuit will spend most of its time in the lower power state with the manipulations required to do this being compensated for elsewhere.
Many FPGA architectures employing various different logic modules and interconnect arrangements are known in the art. Some architectures are flat while others are clustered. In a flat architecture, the logic modules may or may not be grouped together with other logic modules, but all of the logic modules have free access to the larger routing architecture.
In a clustered architecture, the logic modules are grouped together into clusters which typically have a two level hierarchy of routing resources associated with them. The first level typically makes interconnections internal to the cluster while the second level typically allows interconnections between clusters. FIG. 1 illustrates a block diagram of a prior art logic cluster which illustrates the basic principles of a clustered architecture. The logic cluster contains four logic modules each comprising a logic function generator circuit of a type sometimes called a look-up table (or LUT) each having four inputs which are designated LUT4 in the diagram. Each LUT4 has an associated flip-flop designated FF. Each flip-flop is a one-bit data storage element that has a data input, a data output, and a clock input. Data is transferred from the logic function generator coupled to the data input to the data output in response to the signal received at the clock input. The output of each LUT4 is coupled to the data input of the associated flip-flop. The output of each LUT4 and each flip-flop is coupled to the block designated Cluster Internal Routing Lines which is the first level of the routing hierarchy. The output of each LUT4 and each flip-flop is also coupled to the block designated External Horizontal & Vertical Routing Lines which is the second level of the routing hierarchy. The cluster input multiplexers, the LUT4 input multiplexers, the cluster internal routing lines, and the external horizontal and vertical routing lines are programmable routing elements. The data channel of each multiplexer selected and the use of each routing line is determined by control elements whose value is determined during the routing process by the design software and whose values typically do not change during normal operation.
As exemplified in FIG. 1, in many modern FPGAs functionality is provided by logic modules and flip-flops. Logic modules can be n-input look-up-tables or any other kind of function generators with n inputs, where n>1. The flip-flops can be simple D-type flip-flops, or they can have additional functionality such as CLEAR, RESET, LOAD, and ENABLE. These additional functions (with the exception of ENABLE) can be synchronous with the clock or asynchronous (or both.)
Logic modules and flip-flops are often grouped into clusters that may typically vary in size from four to more than twenty. The clustering provides no additional functionality; it is done for routing convenience. In addition to the functionality provided by the logic modules and flip-flops, the FPGAs may include other types of functional blocks such as multipliers, RAMs, FIFOs, etc.
The most common arrangement of logic modules and flip-flops is shown in FIG. 2. In this kind of arrangement, the Y output of logic module 10 directly drives the Di input of the flip-flop 12. Note that D is used to denote the “external” version of the data input and that Di is used to denote the “internal” versions of the data input. In FIG. 2, these are the same circuit node, but that will not be the case in some of the subsequent drawing figures.
The X1, X2, X3 and X4 data inputs of the logic module 10 are each driven by a multiplexer; multiplexer 14 drives data input X1, multiplexer 16 drives data input X2, multiplexer 18 drives data input X3, multiplexer 20 drives data input X4. Each of multiplexers 14, 16, 18 and 20 have a plurality of data inputs that are driven from routing tracks as is known in the art. Multiplexer 22 allows the Q output of flip-flop 12 to be used as an additional input to the X4 data input of logic module 10. The clock (CK) input of flip-flop 12 is driven by the output of multiplexer 24, which allows selection between the various clock resources at its data inputs. Multiplexers 14, 16, 18, 20, 22 and 24 are programmable routing elements. The data channel selected is determined by control elements whose value is determined during the routing process by the design software and whose values typically do not change during normal operation. The output Y of the logic module 10 and the output Q of the flip-flop 12 are coupled to other programmable routing elements not shown in the drawing figure.
The arrangement shown in FIG. 2 has been used in many different commercial products. This is an economical arrangement in terms of routing fabric usage, but it is also the most limited in terms of flexibly packing logic functions and flip-flops together. Unless the flip-flop is packed with the logic that drives it, the logic block functionality must be used as a feed-through buffer and is thus wasted. In typical FPGA designs, this limitation causes a large number of isolated flip-flops to be present that are not packed together with logic modules.
The packing limitations of the arrangement shown in FIG. 2 can be improved significantly by allowing configurable connections between the logic modules and the flip-flops, as shown in FIG. 3. An additional multiplexer 26 (also a programmable routing element) permits selection of the source of the D input to flip-flop 12 between the Y output of logic module 10 and the output of multiplexer 20 that drives the X4 input to the logic module 10 indirectly through multiplexer 22.
The arrangement shown in FIG. 3 is very commonly used in commercial products. As will be appreciated by persons of ordinary skill in the art, the logic module 10 in the arrangement of FIG. 3 is no longer wasted if the D-input of the flip-flop 12 is not driven from its output. On average, this improves the packing efficiency by packing 20% more flip-flops with logic modules. However, even this arrangement has limitations when the logic module 10 does not drive the flip-flop 12. The total number of combined data inputs to the logic module 10 and to the flip-flop 12 must be “n”, the same as the maximum number of inputs to the logic module. This either means that the logic module is used in a limited role by computing a logic function of n−1 inputs, or that one of the inputs of the logic module must be driven from the Q output of the flip-flop.
Even though the arrangement shown in FIG. 3 improves the packing density, the improvement comes with a small performance penalty due to the delay through the multiplexer 26 between the logic module 10 and the flip-flop 12. This is typically a small delay that is well worth the increase in packing density, as long as the multiplexer 26 remains a single-level multiplexer.
The flip-flop 12 shown in FIG. 2 and FIG. 3 is a simple D-type flip-flop. While many different additional functions like SET, RESET, etc., can be added to the flip-flop 12, of particular interest is the addition of an enable function because of its effects on the circuit topology. FIG. 4 shows the addition of the enable function by the addition of multiplexer 28 between multiplexer 26 and the Di input of flip-flop 12. The select line of multiplexer 28 is the enable (EN) signal. It is driven by multiplexer 30 which allows the selection of various routing resources at its data inputs. In the circuit of FIG. 4, the new complex flip-flop 32 with an enable is indicated by the rectangle drawn with a broken line. In this case, the Di input of the basic D-type flip-flop 12 and the D input of the complex flip-flop 32 are two different nodes in the circuit.
FIG. 5 shows some exemplary circuit detail of the complex flip-flop 32. Many different designs with flip-flops with an enable function are known in the art. In the complex flip-flop 32 are shown circuit details for multiplexer 28 and flip-flop 12. Multiplexer 28 comprises transmission gates 34 and 36 and inverter 38. The connections of the two transmission gates ensure that when one is open the other is closed. Thus depending on the value of the EN signal, either the output Q or the input D will be presented to the Di input of flip-flop 12.
Flip-flop 12 comprises two latches. The first latch comprises transmission gates 40 and 46 and inverters 42 and 44 while the second latch comprises transmission gates 50 and 56 and inverters 52 and 54. Inverter 48 is shared between the two latches.
In this configuration, data is transferred from the first latch to the second latch (and thus the output Q) on the rising edge of CK. When CK is at logic-1, transmission gates 40 and 56 are closed and transmission gates 46 and 50 are open. In this case, the first latch is isolated from multiplexer 28 and the feedback path comprising inverters 42 and 44 and transmission gate 46 is active, while the second latch receives the data stored in the first latch without conflict since the feedback path comprising inverters 52 and 54 and transmission gate 56 is inactive.
When CK is at logic-0, transmission gates 40 and 56 are open when transmission gates 46 and 50 are closed. In this case, the first latch receives the data output from multiplexer 28 and the feedback path comprising inverters 42 and 44 and transmission gate 46 is inactive, while the second latch is isolated from the first latch and the feedback path comprising inverters 52 and 54 and transmission gate 56 is active.
The enable function allows new data to be clocked into flip-flop 12 on the rising edge when the EN signal is at logic-1 and to hold the previous data by feeding it back to the flip-flop 12 when the EN signal is at logic-0. Thus when the flip-flop is “enabled” it can receive new data and when “disabled” it holds the old data.
One circuit issue present in the configuration in FIG. 4 is the long series connection of multiplexers 20, 26, 28, and the multiplexer formed by transmission gates 40 and 46 and inverter 48 that a signal must pass through before being buffered by inverter 42. Typically these multiplexers are constructed out of CMOS transmission gates (of the sort shown in FIG. 5), NMOS pass transistors, or floating gate flash transistors in flash-based FPGAs (which function much like NMOS pass transistors in this context). Multiplexer 20 can be particularly problematic, since it is part of the routing fabric. Typically it can be very wide (i.e., having many data inputs) and may even comprise two series stages of pass transistors or transmission gates. That means that a signal coming in through multiplexer 20 which is routed through multiplexers 26 and 28 to flip-flop 12 can potentially pass through four or five stages of pass transistors or transmission gates without buffering. This can cause a significant performance degradation in the circuit which can be unacceptable in a commercial product.
One possible solution is to insert a buffer somewhere in the path. Unfortunately, CMOS buffers require two inverting gain stages. While this will reduce RC delay through the pass transistors or transmission gates and the accompanying metal lines, it can introduce an unacceptable delay of its own.