1. Field of Invention
The invention is generally directed to monolithic integrated circuits, and more specifically to a scalable architecture for use within Programmable Logic Devices (PLD's). It is even more specifically directed to a subclass of PLD's known as High-Density Complex Programmable Logic Devices (HCPLD's).
2. Cross Reference to Related Patents
The disclosures of the following U.S. patents are incorporated herein by reference:
(A) U.S. Pat. No. 5,764,078 issued Jun. 9, 1998 to Om Agrawal et al, and entitled, FAMILY OF MULTIPLE SEGMENTED PROGRAMMABLE LOGIC BLOCKS INTERCONNECTED BY A HIGH SPEED CENTRALIZED SWITCH MATRIX; PA1 (B) U.S. Pat. No. 5,811,986 issued Sep. 22, 1998 to Om Agrawal et al, and entitled, FLEXIBLE SYNCHRONOUS/ASYNCHRONOUS CELL STRUCTURE FOR HIGH DENSITY PROGRAMMABLE LOGIC DEVICE; PA1 (C) U.S. Pat. No. 5,818,254 issued Oct. 6, 1998 to Om Agrawal et al, and entitled, MULTI-TIERED HIERARCHICAL HIGH SPEED SWITCH MATRIX STRUCTURE FOR VERY HIGH DENSITY COMPLEX PROGRAMMABLE LOGIC DEVICES; PA1 (D) U.S. Pat. No. 5,789,939 issued Aug. 4, 1998 to Om Agrawal et al, and entitled, METHOD FOR PROVIDING A PLURALITY OF HIERARCHICAL SIGNAL PATHS IN A VERY HIGH DENSITY PROGRAMMABLE LOGIC DEVICE; PA1 (E) U.S. Pat. No. 5,621,650 issued Apr. 15, 1997 to Om Agrawal et al, and entitled, PROGRAMMABLE LOGIC DEVICE WITH INTERNAL TIME-CONSTANT MULTIPLEXING OF SIGNALS FROM EXTERNAL INTERCONNECT BUSES; and PA1 (F) U.S. Pat. No. 5,185,706 issued Feb. 9, 1993 to Om Agrawal et al. PA1 (1) A user-accessible, configuration-defining memory means, such as EPROM, EEPROM, anti-fused, fused, SRAM, or other, is provided in the CPLD device so as to be at least once-programmable by device users for defining user-provided configuration instructions. Static Random Access Memory or SRAM is of course, a form of reprogrammable memory that can be differently programmed many times. Electrically Erasable and reProgrammable ROM or EEPROM is an example of nonvolatile reprogrammable memory. The configuration-defining memory of a CPLD device can be formed of a mixture of different kinds of memory elements if desired (e.g., SRAM and EEPROM). Typically it is of the nonvolatile, In-System reprogrammable (ISP) kind such as EEPROM. PA1 (2) Input/Output means (IO's) are provided for interconnecting internal circuit components of the CPLD device with external circuitry. The IO's may have fixed configurations or they may include configurable features such as variable slew-output drivers whose characteristics may be fine tuned in accordance with user-provided configuration instructions stored in the configuration-defining memory means. PA1 (3) Programmable Logic Blocks (PLB's) are provided for carrying out user-programmed logic functions as defined by user-provided configuration instructions stored in the configuration-defining memory means. Typically, each of the many PLB's of a CPLD has at least a Boolean sum-of-products generating circuit (e.g., and AND/OR array) or a Boolean product-of-sums generating circuit (e.g., and OR/AND array) that is user-configurable to define a desired Boolean function,--to the extent allowed by the number of product terms (PT's) or sum terms that are combinable by that circuit. PA1 Each PLB may have other resources such as input signal pre-processing resources and output signal post-processing resources. The output signal post-processing resources may include result storing and/or timing adjustment resources such as clock-synchronized registers. Although the term `PLB` was adopted by early pioneers of CPLD technology, it is not uncommon to see other names being given to the repeated portion of the CPLD that carries out user-programmed logic functions and timing adjustments to the resultant function signals. PA1 (4) An interconnect network is generally provided for carrying signal traffic within the CPLD between various PLB's and/or between various IO's and/or between various IO's and PLB's. At least part of the interconnect network is typically configurable so as to allow for programmably-defined routing of signals between various PLB's and/or IO's in accordance with user-defined routing instructions stored in the configuration-defining memory means. Another part of the interconnect network may be hard wired or nonconfigurable such that it does not allow for programmed definition of the path to be taken by respective signals traveling along such hard wired interconnect.
3. Description of Related Art
Field-Programmable Logic Devices (FPLD's) have continuously evolved to better serve the unique needs of different end-users. From the time of introduction of simple PLD's such as the Advanced Micro Devices 22V10.TM. Programmable Array Logic device (PAL), the art has branched out in several different directions.
One evolutionary branch of FPLD's has grown along a paradigm known as Field Programmable Gate Arrays or FPGA's. Examples of such devices include the XC2000.TM. and XC3000.TM. families of FPGA devices introduced by Xilinx, Inc. of San Jose, Calif. The architectures of these devices are exemplified in U.S. Pat. Nos. 4,642,487; 4,706,216; 4,713,557; and 4,758,985; each of which is originally assigned to Xilinx, Inc.
An FPGA may be generally characterized as a monolithic, integrated circuit that has an array of user-programmable, lookup tables (LUT's) that can each implement any Boolean function to the extent allowed by the address space of the LUT. User-programmable interconnect is typically provided for interconnecting primitive, LUT-implemented functions and for thereby defining more complex functions.
Because LUT-based function implementation tends to be functionally more exhaustive (broader) but speed-wise slower than gate-based (e.g., AND/OR-based) function implementation, FPGA's are generally recognized in the art as having a relatively expansive capability of implementing a wide variety of functions (broad functionality) but at relatively slow speed. Also, because length of signal routings through the programmable interconnect of an FPGA can vary significantly, FPGA's are generally recognized as providing relatively inconsistent signal delays whose values can vary substantially depending on how partitioning, placement and routing software configures the FPGA.
A second evolutionary chain in the art of field programmable logic has branched out along a paradigm known as Complex PLD's or CPLD's. This paradigm is characterized by devices such as the Vantis (subsidiary of Advanced Micro Devices Inc.) MACH.TM. family. Examples of CPLD circuitry are seen in U.S. Pat. No. 5,015,884 (issued May 14, 1991 to Om P. Agrawal et al.) and U.S. Pat. No. 5,151,623 (issued Sep. 29, 1992 to Om P. Agrawal et al.) as well as in other CPLD patents cited above.
A CPLD device can be characterized as a monolithic, integrated circuit (IC) that has four major features as follows.
In contrast to LUT-based FPGA's, gate-based CPLD's are generally recognized in the art as having a relatively less-expansive capability of implementing a wide variety of functions (in other words, not being able to implement all Boolean functions for a given input space) but being able to do so at relatively higher speeds. In other words, very wide functionality is sacrificed to obtain shorter, pin-to-pin signal delays. Also, because length of signal routings through the programmable interconnect of a CPLD is often arranged so it will not vary significantly despite different signal routings, CPLD's are generally recognized as being able to provide relatively consistent signal delays whose values do not vary substantially based on how partitioning, placement and routing software configures the CPLD. Many devices in the Vantis MACH.TM. family provide such a consistent signal delay characteristic under the Vantis trade name of SpeedLocking.TM.. The more generic term, Speed-Consistency will be used interchangeably herein with the term, SpeedLocking.TM..
A newly evolving sub-branch of the growing families of CPLD devices is known as High-Density Complex Programmable Logic Devices (HCPLD's). This sub-branch may be generally characterized as monolithic IC's that have large numbers of I/O terminals (e.g., Input/Output pins) in the range of about 64 or more (e.g., 96, 128, 192, 256, 320, etc.) and/or have large numbers of result-storing macrocells in the range of about 256 or more (e.g., 320, 512, 1024, etc.). The process of concentrating large numbers of I/O pins and/or large numbers of macrocells into a single CPLD IC raises new challenges for achieving relatively broad functionality, high speed, and Speed-Consistency (SpeedLocking.TM.) in the face of wide varieties of configuration software.
Configuration software can produce different results, good or bad, depending in part on what broadness of functionalities, what routing flexibilities and what timing flexibilities are provided by the architecture of the target CPLD. Modern CPLD's typically offer a large spectrum of user-configurable options with respect to how each of many PLB's can be configured, how each of many interconnect resources can be configured, and how each of many IO's can be used and/or configured. Rather than determining with pencil and paper how each of the configurable resources of a CPLD should be programmed, it is common practice to employ a computer and appropriate CPLD-configuring software to automatically generate the configuration instruction signals that will be supplied to, and that will cause an unprogrammed CPLD to implement a specific design.
CPLD-configuring software typically cycles through a series of phases that are referred to commonly as `fitting`. These phases may also be referred to as `partitioning`, `placement`, and `routing`. The fitting software is sometimes referred to as a `place and route` program. Alternate names of software tools that operate at a more global level may include, `synthesis, mapping and optimization tools`.
In the partitioning phase, an original circuit design (which is usually relatively large and complex) is divided into smaller chunks, where each chunk is made sufficiently small to be implemented within a single PLB (where such a PLB typically includes one or more AND/OR arrays). During the partitioning phase, the resources of each PLB remain as those of a yet-unspecified one of the many PLB's that are available in the yet-unprogrammed CPLD device. It is during placement that physical locations and groupings are assigned to partitioned chunks.
Differently designed CPLD's can have differently designed PLB's with respectively different, logic-implementing capabilities and timing capabilities. As such, the maximum size that will be allowed for partitioned chunk can vary in accordance with the specific CPLD device that is designated to implement the original circuit design.
By way of example, each PLB of a given, first CPLD architecture may be able to generate in one pass (where the one pass does not include the use of a feedback loop) a sum-of-products (SoP) function signal of the expressive form: EQU f.sub.SoP =.SIGMA..sup.N (PTi.sup.Ki/Kmax/L) {Exp. A}.
In this sum-of-products expression (Exp. A), the N factor represents a maximum number of product terms (PT's) that can be generated and thereafter summed by a respective PLB for defining the one sum-of-products function signal, f.sub.SoP. The Kmax factor represents in the same expression, Exp. A, a maximum number of independent, PLB input signals that can be acquired from a set of L available lines. Ki is the number of actual signals that are used as a subset of Kmax for defining a corresponding, i-th product term, PTi. The acquired subset of Ki signals are ANDed together in the respective PLB to define each respective, i-th product term (PTi). If Ki=0, then PTi=0 and that PTi therefore does not contribute to the Boolean sum.
By way of a more concrete example, consider a PLB of a given first CPLD architecture where each sum-of-products can have a maximum of 3 PT's, with each PT being a product of no more than 16 input terms, where the input terms are sampled from 64 nearby lines. Such a PLB may therefore be able to generate in one pass, a first SoP function of the expressive form: EQU f.sub.SoP1 =.SIGMA..sup.3 (PTi.sup.Ki/16.sup..sub.-- .sup.max/fm.sup..sub.-- .sup.64.sup..sub.-- .sup.Lines) {Exp. A1}
Consider also, for purposes of contrast, a PLB of a given second, and differently designed, CPLD architecture where each sum-of-products can have a maximum of 4 PT's, with each PT being a product of no more than 32 input terms, where the input terms are sampled from 96 nearby lines. Such a PLB may therefore be able to generate in one pass, a second SoP function of the expressive form: EQU f.sub.SoP2 =.SIGMA..sup.4 (PTi.sup.Ki/32.sup..sub.-- .sup.max/fm.sup..sub.-- .sup.96.sup..sub.-- .sup.Lines) {Exp. A2}.
In other words, due to architectural constraints, it is possible that the one-pass, sum-of-products result (f.sub.SoP1 =PT.sub.1 +PT.sub.2 +PT.sub.3, see Exp. A1) of a PLB in the first CPLD architecture can be no more complex than a sum of three independent product terms (3 PT's), where each such PTi is no more complex than a product of no more than sixteen (16) independent, PLB term input signals that are sampled out of an available and larger set of sixty-four (64) independent signals.
In contrast, and again due to architectural variations, the one-pass, sum-of-products result (f.sub.SoP2 =PT.sub.1 +PT.sub.2 +PT.sub.3 +PT.sub.4, see Exp. A2) of a PLB in the second CPLD architecture can be as complex as a sum of four independent product terms (4 PT's) where each such PTi is as complex as a product of up to 32 independent, PLB term input signals that are sampled by multiplexing from an available and nearby set of 96 independent signals. The 1-out-of-3 sampling ratio, 32 max/96 that is implied in expression Exp. A2 is an input multiplexing factor of the PLB. It shows that the PLB has only 32 input lines whose maximum of 32 input signals are sampled from a nearby array of 96, signal broadcasting lines. At least three such PLB's would be needed to sample all 96 of the broadcast signals. But no one PLB can, in a single pass (a single time slice that does not use feedback), generate a function signal that represents a function of as many as all 96 of the broadcast signals.
The significance of factors such as the above N, Kmax and L and the significance of the ratio Kmax/L will become more apparent later. For now it should be understood that choice of the N, Kmax and L factors for a PLB is a matter of delicate design balance.
On the one hand, by choosing to use larger absolute values for N, Kmax and L plus larger values of the ratio, Kmax/L a CPLD designer can advantageously provide greater flexibility to the number of options that CPLD configuring software will have as it performs partitioning, placement and routing. On the other hand, if the CPLD designer arbitrarily chooses to increase the values of N, Kmax and L and to increase the ratio, Kmax/L, the designer may find that such modifications have led to excessive electrical capacitance on routing lines and excessive signal processing delays.
The reason why, is because Kmax times L defines a number of crosspoints that will be created for each PLB when the Kmax number of lines of each PLB cross with the L number of adjacent, signal broadcasting lines. The reciprocal of Kmax/L indicates the minimum number of PLB's that will be needed to fully sample all L of the adjacent signals. (L/Kmax times Kmax equals L.) Typically, the CPLD designer will want the CPLD to be able to process all L signals simultaneously (in parallel) so the designer will provide at least a L/Kmax number of PLB's. The same reciprocal ratio, L/Kmax also gives a rough indication of the extent to which the L signal broadcasting lines of the CPLD architecture will be loaded by PIP's (programmable interconnect points). The exact value of loading will depend on the extent to which each set of L times Kmax crosspoints is fully or partially-populated by PIP's.
One previous patent (U.S. Pat. No. 5,818,254 issued Oct. 6, 1998) suggests that the number of input lines per PLB (Kmax) should be kept relatively small (e.g., about 32 PLB input lines or less) and that a 3-level hierarchical switch matrix should be employed to avoid excessive signal processing delays in HCPLD's. This approach has benefits and drawbacks. On the one hand, capacitive loading is reduced for global interconnect. On the other hand, a 3-level hierarchy in the switch matrix architecture can make SpeedLocking.TM. problematic as one tries to migrate to higher density devices.
With the above factors in mind, we will now continue our discussion about the basics of CPLD configuring software.
The design partitioning phase needs to account for the PLB architectural factors, L, N, and Kmax because those values respectively define: (a) how many signals can be processed in parallel by the available plurality of PLB's, (b) how many product terms can be incorporated into each sum-of-products signal, and (c) what portion of the L available signals each product term can encompass in just one pass.
Of course, for purposes of entering into the design partitioning phase, the original circuit design can be originally specified in terms of a Hardware Descriptor Language (HDL) and/or as a gate level description, or in other suitable form. Ultimately, and in one way or another, the original circuit design will have to be re-mapped into terms of sums-of-products, where such sums-of-products (SoP's) are to be implemented in respective time slots (passes) by respective ones of available PLB's and then fed forward in parallel by L lines for subsequent processing.
In order to fit its results inside the limited f.sub.SoP capabilities of each PLB, the design partitioning phase will have to cast its primitive sums-of-products such that they are each equal to or less than the N-defined and Kmax-defined limits of the f.sub.SoP results that can be produced by respective PLB's of the targeted CPLD.
If the architecture of the targeted CPLD is such that each of the above-described factors, N, Kmax and L (Exp. A) is relatively large, then the maximal fsoP results per PLB will tend to be relatively large and the design partitioning phase will be advantageously allowed to work with larger-sized, partition chunks. However signal delay may become excessive if N, Kmax and L are too large.
On the other hand, if the architecture of the targeted CPLD is such that each of the above-described factors, N, Kmax and L (Exp. A) is relatively small, then the maximal f.sub.SoP results per PLB will tend to be relatively small and the design partitioning phase will be disadvantageously forced to work with comparably, smaller-sized partition chunks and a larger number of interconnect lines. Signal delay may be more or less of a problem because of this. However, one thing will be generally true. As the partitioning phase is forced to produce larger and larger numbers of decreasing-in-size chunks, more work is disadvantageously created for the next-described, placement and routing phases of the CPLD configuring software because they have to process more data objects.
After the partitioning phase is carried out, each resulting chunk is virtually positioned (`placed`) into a specific, chunk-implementing PLB of the designated CPLD during a subsequent placement phase.
In the ensuing routing phase, an attempt is made to algorithmically establish connections between the various chunk-implementing PLB's of the CPLD device, using the interconnect resources (the L lines) of the designated CPLD device. The goal is to reconstruct the functionality of the original circuit design by appropriately connecting all the partitioned and placed chunks.
If all goes well in the partitioning, placement, and routing phases, the CPLD configuring software will find a workable `solution` comprised of a specific partitioning of the original circuit, a specific set of primitive placements in specific PLB's, and a specific set of interconnect usage decisions (routings). The software can then deem its mission to be complete and it can use the placement and routing results to generate the configuring code that will be used to correspondingly configure the designated CPLD.
In various instances, however, the CPLD configuring software may find that it cannot complete its mission successfully on a first try. It may find, for example that the initially-chosen placement strategy prevents the routing phase from completing successfully. This might occur because signal routing resources have been exhausted in one or more congested parts of the designated CPLD device. Some necessary interconnections may have not been completed through those congested parts. Alternatively, all necessary interconnections may have been completed, but the CPLD configuring software may find that simulation-predicted performance of the resulting circuit (the so-configured CPLD) is below an acceptable threshold. For example, signal propagation time may be too large in a speed-critical part of the CPLD-implemented circuit.
In either case, if the initial partitioning, placement and routing phases do not provide an acceptable solution, the CPLD configuring software will try to modify its initial place and route choices so as to remedy the problem. Typically, the software will make iterative modifications to its initial choices until at least a functional place-and-route strategy is found (one where all necessary connections are completed), and more preferably until a place-and-route strategy is found that brings performance of the CPLD-implemented circuit to a near-optimum point. The latter step is at times referred to as `optimization`. Modifications attempted by the software may include re-partitionings of the original circuit design as well as repeated iterations of the place and route phases.
There are usually a very large number of possible choices in each of the partitioning, placement, and routing phases. CPLD configuring programs typically try to explore a multitude of promising avenues within a finite amount of time to see what effects each partitioning, placement, and routing move may have on the ultimate outcome. This in a way is analogous to how chess-playing machines explore ramifications of each move of each chess piece on the end-game. Even when relatively powerful, high-speed computers are used, it may take the CPLD configuring software a significant amount of time to find a workable solution.
In some instances, even after having spent a large amount of time trying to find a solution for a given CPLD-implementation problem, the CPLD configuring software may fail to come up with a workable solution and the time spent becomes lost turn-around time. It may be that, because of packing inefficiencies, the user has chosen too small a CPLD device for implementing too large of an original circuit.
Another possibility is that the internal architecture of the designated CPLD device does not mesh well with the organization and/or timing requirements of the original circuit design.
Organizations of original circuit designs can include portions that may be described as `random logic` (because they have no generally repeating pattern). The organizations can additionally or alternatively include portions that may be described as `bus oriented` (because they carry out nibble-wide, byte-wide, or word-wide, parallel operations). The organizations can yet further include portions that may be described as `matrix oriented` (because they carry out matrix-like operations such as multiplying two, multidimensional vectors). These are just examples of taxonomical descriptions that may be applied to various design organizations. Another example is `control logic` which is less random than fully `random logic` but less regular than `bus oriented` designs. There may be many more taxonomical descriptions. The point is that some CPLD structures may be better suited for implementing random logic while others may be better suited for implementing bus oriented designs or other kinds of designs.
Even where a CPLD architecture is specifically designed to mesh with bus oriented designs, the bit width of the bus oriented design may present a problem. More on this later. We first continue describing the usage of CPLD configuring software.
If the CPLD configuring software fails in a first run, the user may choose to try again with a differently-structured CPLD device. The user may alternatively choose to spread the problem out over a larger number of CPLD devices, or even to switch to another circuit implementing strategy such as FPGA or ASIC (where the latter is an Application Specific hardwired design of an IC). Each of these options invariably consumes extra time and can incur more costs than originally planned for.
CPLD device users usually do not want to suffer through such problems. Instead, they typically want to see a fast turnaround time of no more than, say a few hours, between the time they complete their original circuit design and the time a first-run CPLD is available to implement and physically test that design.
Aside from merely being able to implement a specific set of Boolean functions within a given CPLD IC, users of CPLD's also usually insist that the circuit implemented by the CPLD perform according to specified timing requirements. Speed is often as important an attribute as full Boolean correctness. That is why the user chose to use a CPLD instead of an FPGA.
Aside from speed and full function implementation, users of CPLD's also usually want a certain degree of re-design agility (flexibility). Even after an initial design is successfully implemented by a CPLD, users may wish to make slight tweaks or other changes to the original design. The re-design agility of a given CPLD architecture may include the ability to re-design certain internal circuits without changing I/O timings. Re-design agility may also include the ability to re-design certain internal circuits without changing the placement of various I/O terminals (e.g., pins). Such re-design agilities are sometimes referred to respectively as re-design Speed-Locking.TM. and Pin-Retention (the former term is a trademark of Vantis Corp., headquartered in Sunnyvale, Calif.). The more generic terms of: `re-design Speed-Consistency` and `re-design PinOut-Consistency` will be respectively used herein interchangeably with `re-design Speed-Locking.TM.` and `re-design Pin-Retention`.
In addition to speed, re-design agility, and full Boolean correctness, users of CPLD's typically ask for optimal emulation of an original design or a re-design in terms of good function packing density, low cost, low power usage, and so forth.
When multiple CPLD's are required to implement a very large original design, high function packing density and efficient use of CPLD internal resources are desired so that implementation costs can be minimized in terms of both the number of CPLD's that will have to be purchased and the amount of printed circuit board space that will be consumed.
Even when only one CPLD is needed to implement a given design, a relatively high function packing density is still desirable because it usually means that a lower cost member of a family of differently sized CPLD's can be selected or that unused resources of the one CPLD can be reserved for future expansion needs or In-System Configuration re-design (ISC redesign).
In summary, end users want the CPLD configuring software to complete its task quickly and to provide an efficiently-packed, high-speed compilation of the functionalities provided by an original circuit design, or by a design tweak, irrespective of the taxonomic organization of the original design.
Some previous CPLD architectures meshed well with specific taxonomic organizations. However, preferences among taxonomic organizations tend to change over time. Industry standards may, at first, favor designs where address and data words have a size in the range of 8 to 16 bits. Later, industry standards may migrate towards larger-sized organizations of signals such as address and data words having sizes in the range of 32 to 64 bits. A CPLD having an architecture that is optimized for bus-oriented word sizes of 8 to 16 bits may not be able to efficiently accommodate designs where word sizes increase into a range of say, 32 to 64 bits. What is needed is a scalable architecture that can accommodate designs having word sizes in the range of 32 to 64 bits without losing speed and re-design agility.