1. Field of the Invention
The invention is generally directed to integrated circuits, more specifically to architectural and physical layouts for Programmable Logic Devices (PLD""s), and even more specifically to a subclass of PLD""s known as Field Programmable Gate Arrays (FPGA""s).
2a. Cross Reference to Related Applications
The following copending U.S. patent application(s) is/are assigned to the assignee of the present application, and its/their disclosures is/are incorporated herein by reference:
(A) Ser. No. 08/828,520 filed Apr. 1, 1997 by Bradley A. Sharpe-Geisler and originally entitled, xe2x80x9cMEMORY BITS USED TO COUPLE LOOK UP TABLE INPUTS TO FACILITATE INCREASED AVAILABILITY TO ROUTING RESOURCES PARTICULARLY FOR VARIABLE SIZED LOOK UP TABLES FOR A FIELD PROGRAMMABLE GATE ARRAY (FPGA)xe2x80x9d;
(B) Ser. No. 08/931,798 filed Sept. 16, 1997 by Bradley A. Sharpe-Geisler and originally entitled, xe2x80x9cCIRCUITRY TO PROVIDE FAST CARRYxe2x80x9d and
(C) Ser. No. 08/700,616 filed Aug. 16, 1996 by Om Agrawal et al (as a continuing divisional with chained cross referencing back to Ser. No. 07/394,221 filed Aug. 15, 1989).
2b. Cross Reference to Related Patents
The following U.S. patent(s) are assigned to the assignee of the present application, and their disclosures are incorporated herein by reference:
(A) U.S. Pat. No. 5,212,652 issued May 18, 1993 to Om Agrawal et al, (filed as Ser. No. 07/394,221 on Aug. 15, 1989) and entitled, PROGRAMMABLE GATE ARRAY WITH IMPROVED INTERCONNECT STRUCTURE;
(B) U.S. Pat. No. 5,621,650 issued Apr. 15, 1997 to Om Agrawal et al, and entitled, PROGRAMMABLE LOGIC DEVICE WITH INTERNAL TIME-CONSTANT MULTIPLEXING OF SIGNALS FROM EXTERNAL INTERCONNECT BUSES; and
(C) U.S. Pat. No. 5,185,706 issued Feb. 9, 1993 to Om Agrawal et al.
3. Description of Related Art
Field-Programmable Logic Devices (FPLD""s) have continuously evolved to better serve the unique needs of different end-users. From the time of introduction of simple PLD""s such as the Advanced Micro Devices 22V10(trademark) Programmable Array Logic device (PAL), the art has branched out in several different directions.
One evolutionary branch of FPLD""s has grown along a paradigm known as Complex PLD""s or CPLD""s. This paradigm is characterized by devices such as the Advanced Micro Devices MACH(trademark) family. Examples of CPLD circuitry are seen in U.S. Pat. No. 5,015,884 (issued May 14, 1991 to Om P. Agrawal et al.) and U.S. Pat. No. 5,151,623 (issued Sep. 29, 1992 to Om P. Agrawal et al.).
Another evolutionary chain in the art of field programmable logic has branched out along a paradigm known as Field Programmable Gate Arrays or FPGA""s. Examples of such devices include the XC2000(trademark) and XC3000(trademark) families of FPGA devices introduced by Xilinx, Inc. of San Jose, Calif. The architectures of these devices are exemplified in U.S. Pat. Nos. 4,642,487; 4,706,216; 4,713,557; and 4,758,985; each of which is originally assigned to Xilinx, Inc.
An FPGA device can be characterized as an integrated circuit that has four major features as follows.
(1) A user-accessible, configuration-defining memory means, such as SRAM, EPROM, EEPROM, anti-fused, fused, or other, is provided in the FPGA device so as to be at least once-programmable by device users for defining user-provided configuration instructions. Static Random Access Memory or SRAM is of course, a form of reprogrammable memory that can be differently programmed many times. Electrically Erasable and reprogrammable ROM or EEPROM is an example of nonvolatile reprogrammable memory. The configuration-defining memory of an FPGA device can be formed of mixture of different kinds of memory elements if desired (e.g., SRAM and EEPROM).
(2) Input/Output Blocks (IOB""s) are provided for interconnecting other internal circuit components of the FPGA device with external circuitry. The IOB""s"" may have fixed configurations or they may be configurable in accordance with user-provided configuration instructions stored in the configuration-defining memory means.
(3) Configurable Logic Blocks (CLB""s) are provided for carrying out user-programmed logic functions as defined by user-provided configuration instructions stored in the configuration-defining memory means. Typically, each of the many CLB""s of an FPGA has at least one lookup table (LUT) that is user-configurable to define any desired truth table, xe2x80x94to the extent allowed by the address space of the LUT. Each CLB may have other resources such as LUT input signal pre-processing resources and LUT output signal post-processing resources. Although the term xe2x80x98CLBxe2x80x99 was adopted by early pioneers of FPGA technology, it is not uncommon to see other names being given to the repeated portion of the FPGA that carries out user-programmed logic functions. The term, xe2x80x98LABxe2x80x99 is used for example in U.S. Pat. No. 5,260,611 to refer to a repeated unit having a 4-input LUT.
(4) An interconnect network is provided for carrying signal traffic within the FPGA device between various CLB""s and/or between various IOB""s and/or between various IOB""s and CLB""s. At least part of the interconnect network is typically configurable so as to allow for programmably-defined routing of signals between various CLB""s and/or IOB""s in accordance with user-defined routing instructions stored in the configuration-defining memory means. Another part of the interconnect network may be hard wired or nonconfigurable such that it does not allow for programmed definition of the path to be taken by respective signals traveling along such hard wired interconnect. A version of hard wired interconnect wherein a given conductor is dedicatedly connected to be always driven by a particular output driver, is sometimes referred to as xe2x80x98direct connectxe2x80x99.
Modern FPGA""s tend to be fairly complex. They typically offer a large spectrum of user-configurable options with respect to how each of many CLB""s should be configured, how each of many interconnect resources should be configured, and how each of many IOB""s should be configured. Rather than determining with pencil and paper how each of the configurable resources of an FPGA device should be programmed, it is common practice to employ a computer and appropriate FPGA-configuring software to automatically generate the configuration instruction signals that will be supplied to, and that will cause an unprogrammed FPGA to implement a specific design.
FPGA-configuring software typically cycles through a series of phases, referred to commonly as xe2x80x98partitioningxe2x80x99, xe2x80x98placementxe2x80x99, and xe2x80x98routingxe2x80x99. This software is sometimes referred to as a xe2x80x98place and routexe2x80x99 program. Alternate names may include, xe2x80x98synthesis, mapping and optimization toolsxe2x80x99.
In the partitioning phase, an original circuit design (which is usually relatively large and complex) is divided into smaller chunks, where each chunk is made sufficiently small to be implemented by a single CLB, the single CLB being a yet-unspecified one of the many CLB""s that are available in the yet-unprogrammed FPGA device. Differently designed FPGA""s can have differently designed CLB""s with respective logic-implementing resources. As such, the maximum size of a partitioned chunk can vary in accordance with the specific FPGA device that is designated to implement the original circuit design. The original circuit design can be specified in terms of a gate level description, or in Hardware Descriptor Language (HDL) form or in other suitable form.
After the partitioning phase is carried out, each resulting chunk is virtually positioned into a specific, chunk-implementing CLB of the designated FPGA during a subsequent placement phase.
In the ensuing routing phase, an attempt is made to algorithmically establish connections between the various chunk-implementing CLB""s of the FPGA device, using the interconnect resources of the designated FPGA device. The goal is to reconstruct the original circuit design by reconnecting all the partitioned and placed chunks.
If all goes well in the partitioning, placement, and routing phases, the FPGA configuring software will find a workable xe2x80x98solutionxe2x80x99 comprised of a specific partitioning of the original circuit, a specific set of CLB placements and a specific set of interconnect usage decisions (routings). It can then deem its mission to be complete and it can use the placement and routing results to generate the configuring code that will be used to correspondingly configure the designated FPGA.
In various instances, however, the FPGA configuring software may find that it cannot complete its mission successfully on a first try. It may find, for example that the initially-chosen placement strategy prevents the routing phase from completing successfully. This might occur because signal routing resources have been exhausted in one or more congested parts of the designated FPGA device. Some necessary interconnections may have not been completed through those congested parts. Alternatively, all necessary interconnections may have been completed, but the FPGA configuring software may find that simulation-predicted performance of the resulting circuit (the so-configured FPGA) is below an acceptable threshold. For example, signal propagation time may be too large in a speed-critical part of the FPGA-implemented circuit.
In either case, if the initial partitioning, placement and routing phases do not provide an acceptable solution, the FPGA configuring software will try to modify its initial place and route choices so as to remedy the problem. Typically, the software will make iterative modifications to its initial choices until at least a functional place-and-route strategy is found (one where all necessary connections are completed), and more preferably until a place-and-route strategy is found that brings performance of the FPGA-implemented circuit to a near-optimum point. The latter step is at times referred to as xe2x80x98optimizationxe2x80x99. Modifications attempted by the software may include re-partitionings of the original circuit design as well as repeated iterations of the place and route phases.
There are usually a very large number of possible choices in each of the partitioning, placement, and routing phases. FPGA configuring programs typically try to explore a multitude of promising avenues within a finite amount of time to see what effects each partitioning, placement, and routing move may have on the ultimate outcome. This in a way is analogous to how chess-playing machines explore ramifications of each move of each chess piece on the end-game. Even when relatively powerful, high-speed computers are used, it may take the FPGA configuring software a significant amount of time to find a workable solution. Turn around time can take more than 8 hours.
In some instances, even after having spent a large amount of time trying to find a solution for a given FPGA-implementation problem, the FPGA configuring software may fail to come up with a workable solution and the time spent becomes lost turn-around time. It may be that, because of packing inefficiencies, the user has chosen too small an FPGA device for implementing too large of an original circuit.
Another possibility is that the internal architecture of the designated FPGA device does not mesh well with the organization and/or timing requirements of the original circuit design.
Organizations of original circuit designs can include portions that may be described as xe2x80x98random logicxe2x80x99 (because they have no generally repeating pattern). The organizations can additionally or alternatively include portions that may be described as xe2x80x98bus orientedxe2x80x99 (because they carry out nibble-wide, byte-wide, or word-wide, parallel operations). The organizations can yet further include portions that may be described as xe2x80x98matrix orientedxe2x80x99 (because they carry out matrix-like operations such as multiplying two, multidimensional vectors). These are just examples of taxonomical descriptions that may be applied to various design organizations. Another example is xe2x80x98control logicxe2x80x99 which is less random than fully xe2x80x98random logicxe2x80x99 but less regular than xe2x80x98bus orientedxe2x80x99 designs. There may be many more taxonomical descriptions. The point is that some FPGA structures may be better suited for implementing random logic while others may be better suited for implementing bus oriented designs or other kinds of designs.
If the FPGA configuring software fails in a first run, the user may choose to try again with a differently-structured FPGA device. The user may alternatively choose to spread the problem out over a larger number of FPGA devices, or even to switch to another circuit implementing strategy such as CPLD or ASIC (where the latter is an Application Specific hardwired design of an IC). Each of these options invariably consumes extra time and can incur more costs than originally planned for.
FPGA device users usually do not want to suffer through such problems. Instead, they typically want to see a fast turnaround time of no more than, say 4 hours, between the time they complete their original circuit design and the time a first-run FPGA is available to implement and physically test that design. More preferably, they would want to see a fast turnaround time of no more than, say 30 minutes, for successful completion of the FPGA configuring software when executing on a 80486-80686 PC platform (that is, a so-commercially specified, IBM compatible personal computer) and implementing a 25000 gate or less, design in a target FPGA device.
FPGA users also usually want the circuit implemented by the FPGA to provide an optimal emulation of the original design in terms of function packing density, cost, speed, power usage, and so forth irrespective of whether the original design is taxonomically describable generally as xe2x80x98random logicxe2x80x99, or as xe2x80x98bus orientedxe2x80x99, or as a combination of these, or otherwise.
When multiple FPGA""s are required to implement a very large original design, high function packing density and efficient use of FPGA internal resources are desired so that implementation costs can be minimized in terms of both the number of FPGA""s that will have to be purchased and the amount of printed circuit board space that will be consumed.
Even when only one FPGA is needed to implement a given design, a relatively high function packing density is still desirable because it usually means that performance speed is being optimized due to reduced wire length. It also usually means that a lower cost member of a family of differently sized FPGA""s can be selected or that unused resources of the one FPGA can be reserved for future expansion needs.
In summary, end users want the FPGA configuring software to complete its task quickly and to provide an efficiently-packed, high-speed compilation of the functionalities provided by an original circuit design irrespective of the taxonomic organization of the original design.
In the past, it was thought that attainment of these goals was primarily the responsibility of the computer programmers who designed the FPGA configuring software. It has been shown however, that the architecture or topology of the unprogrammed FPGA can play a significant role in determining how well and how quickly the FPGA configuring software completes the partitioning, placement, and routing tasks.
An improved FPGA architecture that helps FPGA configuring software to better reach its goals was disclosed in U.S. Pat. No 5,212,652, issued May 18, 1993 to Agrawal et al. The improvement provided a symmetrically balanced distribution of logic function resources and routing resources in both horizontal and vertical directions so that placement and routing was not directionally constrained to, for example, a left-to-right signal flow orientation. Balanced availability of logic function-implementing resources and signal-routing resources was provided to give the FPGA configuring software more degrees of freedom in each of the partitioning, placement, and routing phases. This increased the likelihood that congestion would be avoided during placement and routing because circuit implementation could be more uniformly distributed instead of being concentrated along a particular direction. It also increased the probability that more efficient solutions would be found in the iterative optimization phases because optimization attempts would not be constrained by pre-existing congestions.
A further improvement was provided in U.S. application Ser. No. 08/080,658, filed Jun. 18, 1993 by Agrawal et al. This further improvement provided a constant-delay, xe2x80x98floating-pinsxe2x80x99 architecture which provided symmetrical choice among a subset of package pinout options without change in performance (without change in signal propagation time).
Further advances in integrated circuit manufacturing technologies have now enabled higher densities of logic function-implementing circuits and higher densities of signal routing resources. This presents opportunities for further-improvements.
An improved FPGA layout architecture in accordance with the invention features a repeating pattern of logic-implementing, Variable Grain Blocks or xe2x80x98VGB""sxe2x80x99.
Each VGB has a plurality of internal resources that can be operated separately to provide elemental levels of functionality but which resources are capable of being merged, cascaded and/or operated in parallel to provide relatively higher levels of functionality as appropriate for a given taxonomic organization of a circuit design originally supplied to the FPGA configuring software.
For example, in one embodiment, the internal resources of each VGB can be merged to implement any Boolean function {f(6T)} of up to 6 independent input terms or they can be cascaded to implement one of a more limited subset of Boolean functions {fxe2x80x2(16T)} each being a function of up to 16 independent input terms. In the same embodiment, each VGB can be partitioned to instead provide 8 Boolean functions, each being any desired function {f(3T)} of up to 3 independent input terms.
In the same embodiment, input-acquiring resources of small-grained elements (so-called CBE""s) can be strapped together so that such elements operate in parallel on a common or semi-common set of input term signals. This enables efficient implementations of dynamic multiplexer circuits and adding/subtracting circuits as will be seen.
Such merging or cascading or parallel-operating of VGB elemental resources can be carried out over a relatively wide spectrum of granularities and along different directions. This spectrum of options enables FPGA configuring software to make efficient use of available resources within each VGB and to find more optimal solutions for a wide variety of circuit-implementation problems, including those that are taxonomically describable as bus oriented, or matrix oriented, or as random logic.
In a preferred class of embodiments, plural VGB""s are symmetrically arranged and wedged together in a manner similar to slices of a symmetrically-cut pie. The congregated or xe2x80x98wedged-togetherxe2x80x99 VGB""s form a super-VGB structure. Each such super-VGB includes centralized means for merging together the resources of its respective VGB""s so that the super-VGB can offer even higher levels of functionality than are provided by each of its constituent VGB""s. In the example where each VGB can provide a limited set of Boolean functions each of up to 16 independent input terms, the corresponding super-VGB can merge 2 or 4 VGB""s together to correspondingly provide a limited set of Boolean functions each of up to 32 or 64 independent input terms. FPGA configuring software is therefore given the options of merging together the VGB""s of a given super-VGB to implement a fewer number of more complex functions or to use the VGB""s individually and thereby implement a larger number of less complex functions.
In further accordance with the invention, plural super-VGB""s are distributed in a matrix across an FPGA device. VGB-to-VGB interconnect lines extend along sides of the super-VGB""s. In a preferred embodiment, there are at least four VGB""s in each super-VGB. Each of these four VGB""s preferably has an L-shaped (or V-shaped) internal organization that lies adjacent to, or forms a peripheral part of the super-VGB.
Within each such L-shaped internal organization, there is provided a symmetrical distribution of function-spawning units. These function-spawning units, which are also referred to herein as xe2x80x98Configurable Building Elementsxe2x80x99 or CBE""s, may be used to acquire input signals and to initiate the synthesis of a spectrum of functions of increasing complexity within the corresponding VGB. Function complexity generally increases as more and more CBE""s are compounded or xe2x80x98folded togetherxe2x80x99 to synthesize larger, function-implementing entities. Synthesis can be carried out with CBE""s of a particular row or column or with CBE""s that lie along crossing rows and columns.
In one embodiment, there is a same, even number of CBE""s along each leg (each primary typographic stroke) of the L-shaped internal organization of each VGB. Input decoder means are provided for linking together input term acquiring resources of neighboring CBE""s and allowing such CBE""s to share acquired input term signals so that such neighboring CBE""s can process same signals in parallel. This sharing of acquired input term signals allows for efficient folding together or compounding of elemental resources as will be detailed below.
Each function-spawning unit (CBE) has a user-configurable signal-acquiring means (CIE) for acquiring a subset of LUT input terms from adjacent interconnect lines. A user-configurable lookup table (LUT) is further provided within each of the function-spawning units (CBE""s) for processing corresponding ones of the acquired LUT input terms. A decoding section (which is part of the above-mentioned input decoder means) is additionally provided between the CIE and LUT of each CBE for supporting the function synthesis process wherein plural CBE""s (Configurable Building Elements) are compounded to define higher levels of functionality.
In one embodiment, each super-VGB is surrounded by diversified set of interconnect resources. These diversified interconnect resources may include: general bidirectional interconnect lines of varying lengths; switch boxes that provide programmable interconnection between the general bidirectional interconnect lines; and unidirectional direct connect lines. The combination of each super-VGB and its immediately surrounding set of diversified interconnect resources defines a core-tile. A set of core-tiles are tiled across a core portion of the FPGA device to define an FPGA core matrix. The FPGA core matrix is then surrounded by and coupled to a complementary array of input/output blocks (IOB""s).
In one particular embodiment, each super-VGB is a square structure having four mirror-opposed VGB""s respectively defining the four corners of the square. Each such square-organized super-VGB may be characterized as having mirror symmetry of resources not only about its horizontal and vertical center lines, but also as having substantial mirror symmetry of programmable resources about its diagonals.
In the same one embodiment, each square-organized super-VGB includes a plurality of at least 8 CBE""s (Configurable Building Elements) symmetrically distributed about its periphery. As explained above, a xe2x80x98CBExe2x80x99 is an elemental structure that may be used to acquire input signals and responsively spawn synthesis of higher level functions. Pairs of CBE""s are incorporated into an encompassing second structure, referred to herein as a xe2x80x98Configurable Building Blockxe2x80x99 (or CBB). In addition to its two CBE""s, each CBB of the one embodiment contains a function-combining multiplexer and a Configurable Sequential Element (CSE). The function-combining multiplexer may be used in combination with the decoding sections of the two CBE""s to fold together the LUT resources of the two CBE""s. The function-combining multiplexer may be additionally used in combination with the decoding sections of the two CBE""s to emulate large sized, dynamic multiplexers (e.g., 4:1). The CSE contains data storage resources and data output resources.
In one embodiment, there at least 16 CBB""s symmetrically distributed about the periphery of each super-VGB. Pairs of CBB""s (Configurable Building Blocks) are programmably combinable to provide more functionally-rich entities. Such combined entities are each referred to herein as a xe2x80x98set of paired-CBB""sxe2x80x99. Two sets of paired-CBB""s are programmably combinable to provide even more functionally-rich entities. Such further combined entities are each referred to herein as a xe2x80x98set of quadrupled-CBB""sxe2x80x99. In the one embodiment, each set of quadrupled-CBB""s may be contained within and consume the function spawning capabilities of a single VGB (Variable Grain Block) such that no further programmable combining of this type is provided for within the VGB proper. However, pairs of VGB""s are further combinable to provide yet more functionally-rich entities within the encompassing super-VGB. It is within the contemplation of the invention to allow for larger numbers of CBE""s or CBB""s within each VGB, to allow for larger numbers of VGB""s within each super-VGB if desired, and to allow for programmable formation of octupled-CBB""s and so forth.
As mentioned, each CBE (Configurable Building Element) of one embodiment has its own Configurable Input Element (CIE) for programmably acquiring from a first set of neighboring signals, a smaller first subset that defines input terms for the CBE""s LUT. The first set of neighboring signals are carried by a respective, first set of interconnect lines that are immediately adjacent to the CIE. The encompassing CBB of respective pairs of CBE""s may be viewed as having the combined input acquiring resources of the two CIE""s found in the corresponding CBE""s. Such combining of input acquiring resources increases the likelihood that the FPGA configuring software will find an unconsumed one of the resources for bringing into the CBB an input term signal riding on a particular one of the immediately adjacent interconnect lines (AIL""s).
Each CIE may optionally include control acquiring means that are user-configurable to select and acquire from a second set of neighboring signals, a second subset that defines control signals for the corresponding VGB. The second set of neighboring signals are carried by respective interconnect lines that are immediately adjacent to the CIE. The sets of interconnect lines that carry control signals may overlap fully or partially with the set that carries input term signals. Control signals selected by the CIE may be optionally used by the Configurable Sequential Element (CSE) of the respective CBE. In one embodiment, control signals acquired by all CIE""s of a given VGB (Variable Grain Block) may be shared by all the CSE""s (Configurable Sequential Elements) of that given VGB. Control signals acquired by all VGB""s of a given super-VGB may also be shared within the given super-VGB.
In addition to its plurality of wedged-together VGB""s, each of the super-VGB""s preferably further includes shared resources that are centrally-placed within the super-VGB and made programmably available for shared use by the peripheral CBB""s of that super-VGB. An example of such centrally-shared resources is a set of longline drive amplifiers and associated shared logic which is discussed in more detail below.
The combinable CBB""s (Configurable Building Blocks) of each VGB are not the only resources within each such Variable Grain Block. Each of the VGB""s additionally has common resources placed diagonally relative to its L-shaped internal organization for shared use by the L-organized resources (by the CBE""s or CBB""s) of that VGB. Examples of such VGB-common resources include: a common controls developing section, a wide-gating section, and a carry propagating section, each of which is discussed in more detail below.
Aside from being combinable to form higher levels of functionality, the function-implementing resources of adjacent VGB""s can be efficiently chained together to defined high-speed, chained functions. An example of such chaining is a string of VGB""s that are programmably linked together to function as a relatively long, binary adding or subtracting circuit. Carry bits ripple through carry propagating sections of the linked together VGB""s. The mirror-opposed L-organized structures of the VGB""s can support zig zagging propagation of carry bits or linear propagation. This will be discussed in more detail below.
As indicated above, each CBB includes its own Configurable Sequential Element (CSE), which CSE is shared by the incorporated CBE""s of that CBB. Each CSE contains at least one data storage element such as a flip flop for providing clock-sequenced operations. Each CSE further contains at least three differently powered (differently-tuned) line drivers. The differently-powered line-drivers are used to drive output signals of the CSE onto adjacent, but differently-loaded interconnect lines. Examples of differently-loaded interconnect lines include: quad-length, bidirectional interconnect lines (4xL lines); octal-length, bidirectional interconnect lines (8xL lines); VGB-local feedback lines (FBL""s); and unidirectional direct connect lines (DCL""s); which lines are discussed in more detail below.
In one embodiment that has 32 CBE""s inside each square-shaped super-VGB, there are 2 generally equivalent, CBB""s (WandY or XandZ) provided along each leg of the L-shaped peripheral portion of each VGB. The L-shaped peripheral portion of each VGB neighbors a crossing of orthogonally-extending interconnect resources (e.g., interconnect channels extending in x and y directions). The configurable input element (CIE) of each CBE cross couples with a sub-population of the immediately neighboring interconnect lines for selectively acquiring from such immediately neighboring interconnect lines, respective subsets of function input-term signals and output-control signals.
The programmable lookup table (LUT) of each such CBE may be coupled through an input decoding section to receive the CIE-acquired input-term signals of that CBE. The CBE""s LUT then responsively generates a first-level function signal from the respectively acquired input-term signals of that CBE. The programmable lookup table of each such CBE may be alternatively coupled by the input decoding section (which section is detailed below) to receive one or more of the acquired input-term signals of adjacent CBE""s and to responsively generate the first-level function signal from those signals instead.
In one embodiment, the configurable input-acquiring element (CIE) of each CBE can acquire up to 3 function input-term signals and one output-control signal from an immediately neighboring, interconnect channel having 56 signal-carrying lines plus 2 or 3 dedicated control lines. Each encompassing CBB can therefore acquire up to 6 function-term input signals and 2 control input signals in that embodiment. Each VGB that forms from a combined set of 4 such CBB""s can therefore acquire 24 function-term input signals and 8 control input signals. In a variant of that one embodiment, each VGB can further acquire 4 clock signals and a global reset (GR) signal from its neighboring interconnect resources. Each super-VGB that has 4 such VGB""s can therefore acquire 96 function-term input signals and 32 control input signals (not counting the global signals, GR and CLK0-CLK3) from its surrounding interconnect lines.
In briefer summation therefore, the Variable Grain Architecture (VGA) described herein includes granularizable function-implementing resources and a diversified assortment of interconnect capabilities, arranged in a packing-wise efficient manner that provides FPGA configuring software with symmetrically balanced choices of different resources in multiple directions.
Consequences of the Variable Grain Architecture
The Variable Grain Architecture (VGA) described herein enables a plurality of advantageous cooperations and consequences.
First, function input-term signals can be symmetrically and equivalently routed by interconnect resources to any one of the plural CBE""s or CBB""s that symmetrically line each neighboring leg of each L-organized VGB. This assists the place and route software by providing directionally-unconstrained, and balanced, access from the neighboring interconnect to the distributed resources of each VGB. Such omni-directiveness is particularly useful when implementing random logic.
Second, local feed conductors that are included in each Configurable Building Element (CBE) for feeding its LUT with input term signals, can be made of minimal length as a result of each CBB being placed along the periphery of the super-VGB, immediately adjacent to the neighboring interconnect lines. The minimized length of such feed conductors (MIL""s plus some decode length) advantageously reduces delay time and increases packing density. Unlike prior designs, all input signals do not have to travel to a function synthesizing core for processing in that core. Instead, input processing and return of result signals may occur in a peripheral layer of the VGB, near the neighboring interconnect lines.
A third advantageous cooperation and/or consequence of the described architecture is that the L-organized, and symmetrically granularizable (partitionable) structure of each VGB helps the FPGA configuring software to perform each of the partitioning, placement and routing operations with relatively wide degrees of freedom and few directional constraints.
During placement for example, the FPGA configuring software may equivalently choose any CBE of the VGB for receiving a like-sized, circuit chunk. If the circuit chunk turns out to be too complex to be accommodated by a single CBE (e.g., because the circuit chunk has too many input terms), the FPGA configuring software may choose to use two CBE""s in folded together combination (within a CBB) for implementing the chunk. In so doing, the FPGA configuring software may equivalently choose among differently located pairs of CBE""s within a given VGB. Any not-yet-consumed CBB may be used to receive such a comparably-sized circuit chunk during the placement phase. This is so because of the symmetrical positioning and basic interchangeability of the CBB""s along the legs of each VGB""s L-shaped internal portion.
In one embodiment, the CBB""s that line each leg of each VGB""s L-shaped portion are made essentially (but not necessarily fully) identical to one another such that a partitioned chunk from an original circuit design xe2x80x94provided it can fit in a single Configurable Building Blockxe2x80x94can be equally placed in, and implemented by, any one of the plural CBB""s (X or Z or W or Y) of the L-organized VGB.
Note that placement interchangeability is provided within each leg of the L-shape (in other words, linearly along each of the x and y directions). Such placement interchangeability along-a-leg may be advantageous in cases where placement on a particularly directed leg (one extending horizontally or vertically) is desired. For example, it may be desirable to place circuitry chunks on vertically-directed legs, adjacent to a vertical interconnect channel, when bus-oriented systems or like parallel-operating systems are being implemented. In general, placement along particularly-directed legs may be helpful and placement interchangeability along such directed-legs may be additionally advantageous.
Note that placement interchangeability is also provided around the combined length of both legs of the L-shaped structure (in other words, irrespective of x and y directional orientations). Such placement interchangeability can give the FPGA configuring software wide degrees of freedom and hence a greater chance of finding an optimal solution for partitioning, placement and routing problems.
More specifically, bus oriented designs may be more efficiently placed and packed using the interchangeability of collinearly positioned CBB""s along colinear first legs of each of multiple, L-organized structures. Random logic oriented designs may be more efficiently placed and packed using the interchangeability of CBB""s distributed about both legs of each of multiple, L-organized structures.
A further advantageous consequence of the described architecture arises from the ability to combine, or fold together equivalent functional resources along each leg of each L-organized structure (within each VGB) and to then fractally combine, or further fold together the combined resources of both legs, as needed. The latter combining of on-the-leg resources can be viewed as a folding of peripheral x and y resources into a shared diagonal of the L-organization.
A spectrum of selectable granulations of functionality is provided by this ability to equivalently fold resources together along either of the x and y directed legs, or to alternatively fold together resources along the diagonal. This spectrum of selectable granulations provides a wide range of choices during the partitioning and placement phases.
For example, if a partitioned chunk is too large to fit into a single xe2x80x98CBBxe2x80x99, the FPGA configuring software has the option of combining, or folding together, the resources of two adjacent, CBB""s to in effect produce the higher capacity, xe2x80x98set of paired-CBB""sxe2x80x99. If the partitioned chunk is found to be still too large to fit into a single set of paired-CBB""s, the FPGA configuring software has the further option of combining, or folding together, the resources of two adjacent, sets of paired-CBB""s to thereby produce an even higher capacity, implementing structure, namely the xe2x80x98set of quadrupled-CBB""sxe2x80x99. If the partitioned chunk is found to be even still too large to fit into a set of quad-CBB""s, the FPGA configuring software has the further option of combining, or folding together, the resources of two VGB""s within a super-VGB to thereby produce an even higher capacity, implementing structure, namely a xe2x80x98set of paired-VGB""sxe2x80x99.
On the other hand, suppose the FPGA configuring software had carried out one run of partitioning, placement, and routing and had not yet found an acceptable solution. Suppose that during a subsequent, iterative repartitioning, a particular circuit chunk is to be made smaller than it was before. Suppose the newly-downsized chunk can now fit into a set of paired-CBB""s where as before, the chunk needed a set of quadrupled-CBB""s. In such a case, the FPGA configuring software has the option of splitting the previously xe2x80x98consumedxe2x80x99 set of quadrupled-CBB""s into one consumed set of paired-CBB""s and one free (not-yet-consumed) set of paired-CBB""s. This makes more efficient use of FPGA resources and frees up the excess resources (the not-consumed set of paired-CBB""s) for other use.
Placement can proceed in either of two ways during each downsizing repartition because the split of functional resources is symmetric between the consumed and not-consumed set of paired-CBB""s. As such, the post-repartitioning placement choice can be made such that it will enable less congestion or higher speed in a subsequent routing selection.
While the example given above involves a down-sizing from the level of a set of quadrupled-CBB""s to the level of a set of paired-CBB""s, similar down-sizing and freedom of placement can occur at lower levels wherein a set of paired-CBB""s are split into individual CBB""s and even where individual CBB""s are split into CBE""s.
Yet another advantageous consequence of the architecture described herein arises from the ability to combine control-acquiring resources (CIE""s) provided along each leg of each L-organized VGB and to use the combined control-acquiring resources (CIE""s) as needed to define common control signals for each VGB (and for each super-VGB) from signals made available along the legs of each respective VGB. The VGB-common control signals may be used to control functions such as: clock (CLK), clock-enable (CLKEN), flip flop reset (RST), flip flop set (SET), or other like controllable features of each CBB.
Still another advantageous consequence of the super-VGB organization described has to do with efficient area utilization within the integrated circuit. Wedged-together VGB""s may be packed tightly in mirror opposed fashion within each super-VGB such that essentially no space is provided between the in-gathered VGB""s for through-running interconnect channels. Instead, interconnect channels are provided around the periphery of the corresponding super-VGB, in close proximity to the peripherally-provided configurable input elements (CIE""s). Input term acquisition and function synthesis begin at the periphery of the super-VGB. Function synthesis proceeds inwardly toward the core of the super-VGB structure in a progressive, graduated manner as functions of higher complexity are synthesized. The most complex functions are preferably synthesized at, or close to the core of the super-VGB structure.
As will be seen, some space is preferably provided at the core of each super-VGB for shared, high-powered line-driving amplifiers. These high-powered amplifiers are located centrally within each super-VGB and shared by the constituent VGB""s of that super-VGB. The high-powered amplifiers are used for driving output signals onto heavily-loaded (e.g., high capacitance) interconnect lines such as the maximum-length interconnect lines (MaxL lines) of the FPGA device.
At the same time, less-powerful line-driving amplifiers are dedicatedly distributed on a per CBB basis (in each CSE) and are used to drive less heavily-loaded interconnect resources (e.g., so-called xe2x80x98double-length short-haulxe2x80x99 lines and direct connect lines, as will be detailed below). The less-powerful amplifiers include those tuned for driving a first load of direct connect lines and those that are differently tuned for driving a different, second load of bidirectional interconnect lines (2xL, 4xL, 8xL).
The higher-powered line-driving amplifiers at the core of each super-VGB provide relatively high slew rates during switching as needed for the MaxL lines. This compensates for the higher electrical capacitance that such long lines tend to have. The less-powerful line-driving amplifiers provide relatively lower slew rates during switching as is acceptable for their corresponding less-heavily loaded (shorter) interconnect lines.
As is known, high-powered amplifiers tend to each consume more of the area of an integrated circuit than do each of comparatively less-powerful drive amplifiers. In accordance with the invention therefore, a trade off is made between the area consumed by line-driving amplifiers and the number of function-implementing circuits they service. The larger, more powerful amplifiers are placed in sharing regions in the core of each super-VGB for shared use by all the VGB""s of that super-VGB. In contrast, the smaller, less-powerful amplifiers are distributed about the periphery of each super-VGB and dedicated to servicing each respective VGB (or each respective CBB of each VGB). In one embodiment, pairs of CBE""s share the drive amplifier resources of a shared CSE (a Configurable Sequential Element).
In complement to placement of the larger-sized, higher-powered amplifiers within the core of each super-VGB, the input signal-acquiring and logic function-implementing circuits of each super-VGB,xe2x80x94namely the CIE""s and LUT""sxe2x80x94are preferentially packed densely around the of each of the partitionable logic blocks (VGB""s). The CIE""s and LUT""s can be made relatively small because they do not have high-powered line-driving. outputs. This provides a more scalable architecture than was seen in prior designs.
A further feature in accordance with the invention is that super-VGB""s are arranged along interconnect channels in symmetrical fashion. Horizontally-extending interconnect channels (HIC""s) and vertically-extending interconnect channels (VIC""s) are provided with essentially same and symmetrically balanced interconnect resources for their respective horizontal (x) and vertical (y) directions. These interconnect resources include a diversified and granulated assortment of MaxL lines, 2xL lines, 4xL lines and 8xL lines as well as corresponding 2xL switch boxes, 4xL switch boxes, and 8xL switch boxes. In one embodiment, most 2xL lines span a distance corresponding to four CBB""s (or 8 CBE""s).
Other aspects of the invention will become apparent from the below detailed description.