1. Field of Invention
The invention is generally directed to monolithic integrated circuits, and more specifically to a repeated macrocell module design for use within Programmable Logic Devices (PLD's). It is even more specifically directed to a macrocell module design as applied to a subclass of PLD's known as High-Density Complex Programmable Logic Devices (HCPLD's).
2a. Cross Reference to Related Applications
The following copending U.S. patent application is owned to the owner of the present application, and its disclosure is incorporated herein by reference:
(A) Ser. No. 09/326,940 filed Jun. 6, 1999 by Om P. Agrawal et al. and originally entitled, "SCALABLE ARCHITECTURE FOR HIGH DENSITY CPLD's HAVING TWO-LEVEL HIERARCHY OF ROUTING RESOURCES". PA1 (A) U.S. Pat. No. 5,811,986 issued Sep. 22, 1998 to Om Agrawal et al, and entitled, FLEXIBLE SYNCHRONOUS/A SYNCHRONOUS CELL STRUCTURE FOR HIGH DENSITY PROGRAMMABLE LOGIC DEVICE; PA1 (B) U.S. Pat. No. 5,764,078 issued Jun. 9, 1998 to Om Agrawal et al, and entitled, FAMILY OF MULTIPLE SEGMENTED PROGRAMMABLE LOGIC BLOCKS INTERCONNECTED BY A HIGH SPEED CENTRALIZED SWITCH MATRIX; PA1 (C) U.S. Pat. No. 5,818,254 issued Oct. 6, 1998 to Om Agrawal et al, and entitled, MULTI-TIERED HIERARCHICAL HIGH SPEED SWITCH MATRIX STRUCTURE FOR VERY HIGH DENSITY COMPLEX PROGRAMMABLE LOGIC DEVICES; PA1 (D) U.S. Pat. No. 5,789,939 issued Aug. 4, 1998 to Om Agrawal et al, and entitled, METHOD FOR PROVIDING A PLURALITY OF HIERARCHICAL SIGNAL PATHS IN A VERY HIGH DENSITY PROGRAMMABLE LOGIC DEVICE; PA1 (E) U.S. Pat. No. 5,621,650 issued Apr. 15, 1997 to Om Agrawal et al, and entitled, PROGRAMMABLE LOGIC DEVICE WITH INTERNAL TIME-CONSTANT MULTIPLEXING OF SIGNALS FROM EXTERNAL INTERCONNECT BUSES; and PA1 (F) U.S. Pat. No. 5,185,706 issued Feb. 9, 1993 to Om Agrawal et al. PA1 (1) A user-accessible, configuration-defining memory means, such as EPROM, EEPROM, anti-fused, fused, SRAM, or other, is provided in the CPLD device so as to be at least once-programmable by device users for defining user-provided configuration instructions. Static Random Access Memory or SRAM is of course, a form of reprogrammable memory that can be differently programmed many times. Electrically Erasable and reprogrammable ROM or EEPROM is an example of nonvolatile reprogrammable memory. The configuration-defining memory of a CPLD device can be formed of a mixture of different kinds of memory elements if desired (e.g., SRAM and EEPROM). Typically it is of the nonvolatile, In-System reprogrammable (ISP) kind such as EEPROM. PA1 (2) Input/Output means (IO's) are provided for interconnecting internal circuit components of the CPLD device with external circuitry. The IO's may have fixed configurations or they may include configurable features such as variable slew-output drivers whose characteristics may be fine tuned in accordance with user-provided configuration instructions stored in the configuration-defining memory means. PA1 (3) Programmable Logic Blocks (PLB's) are provided for carrying out user-programmed logic functions as defined by user-provided configuration instructions stored in the configuration-defining memory means. Typically, each of the many PLB's of a CPLD has at least a Boolean sum-of-products generating circuit (e.g., and AND/OR array) or a Boolean product-of-sums generating circuit (e.g., and OR/AND array) that is user-configurable to define a desired Boolean function, --to the extent allowed by the number of product terms (PT's) or sum terms (ST's) that are combinable by that circuit. PA1 (4) An interconnect network is generally provided for carrying signal traffic within the CPLD between various PLB's and/or between various IO's and/or between various IO's and PLB's. At least part of the interconnect network is typically configurable so as to allow for programmably-defined routing of signals between various PLB's and/or IO's in accordance with user-defined routing instructions stored in the configuration-defining memory means. PA1 (a) at least five product term generators (A0-A4) for producing respective and local product term signals (PT0-PT4) each representing up to 80 independent input terms or more; (b) an SoP-producing gate that can produce a Boolean sum-of-products of one or more of the respective and local product term signals (PT0-PT4); PA1 (c) at least five PT steerers that may be programmably configured to either direct to the local SoP-producing gate or steer-away a respective one of the local product term signals (PT0-PT4); (d) an SoS-producing gate that can produce a sums-of-sums signal (SoS) that represents a Boolean sum of one or more of the local SoP signals, and/or of SoP signals of neighboring macrocell modules, and/or of SoS signals of neighboring macrocell modules; PA1 (e) a post-SoP steerer that may be programmably configured to direct to the local SoS-producing gate or to steer-away to SoS-producing gates of neighboring macrocell modules, the local SoP signal; (f) a first XOR gate having an output coupled to an IN terminal of a local storage and/or combinatorial-pass-through element; PA1 (g) a post-SoS steerer that may be programmably configured to direct to a first input of the first XOR gate or to steer-away to SoS-producing gates of neighboring macrocell modules, the local SoS signal; and PA1 (h) an XOR-feeding multiplexer that may be programmably configured to direct to a second input of the first XOR gate, at least an adjustable local polarity signal (LP'), where the LP' signal may be programmably defined as a constant logic `0` or as a constant logic `1`.
2b. Cross Reference to Related Patents
The disclosures of the following U.S. patents are incorporated herein by reference:
3. Description of Related Art
Field-Programmable Logic Devices (FPLD's) have continuously evolved to better serve the unique needs of different end-users. From the time of introduction of simple PLD's such as the Advanced Micro Devices 22V10.TM. Programmable Array Logic device (PAL), the art has branched out in several different directions.
One evolutionary branch of FPLD's has grown along a paradigm known as Field Programmable Gate Arrays or FPGA's. Examples of such devices include the XC2000.TM. and XC3000.TM. families of FPGA devices introduced by Xilinx, Inc. of San Jose, Calif. The architectures of these devices are exemplified in U.S. Pat. Nos. 4,642,487; 4,706,216; 4,713,557; and 4,758,985; each of which is originally assigned to Xilinx, Inc.
An FPGA may be generally characterized as a monolithic, integrated circuit that has an array of user-programmable, lookup tables (LUT's) that can each implement any Boolean function to the extent allowed by the address space of the LUT. User-programmable interconnect is typically provided for interconnecting primitive, LUT-implemented functions and for thereby defining more complex functions.
Because LUT-based function implementation tends to be functionally more exhaustive (broader) but speed-wise slower than gate-based (e.g., AND/OR-based) function implementation, FPGA's are generally recognized in the art as having a relatively more expansive capability of implementing a wide variety of functions (broad functionality) but at relatively slower speed. Also, because length of signal routings through the programmable interconnect of an FPGA can vary significantly, FPGA's are generally recognized as providing relatively inconsistent signal delays whose values can vary substantially depending on how partitioning, placement and routing software configures the FPGA.
A second evolutionary chain in the art has branched out along a paradigm known as Complex PLD's or CPLD's. This paradigm is characterized by devices such as the Vantis (subsidiary of Advanced Micro Devices Inc.) MACH.TM. family. Examples of CPLD circuitry are seen in U.S. Pat. No. 5,015,884 (issued May 14, 1991 to Om P. Agrawal et al.) and U.S. Pat. No. 5,151,623 (issued Sep. 29, 1992 to Om P. Agrawal et al.) as well as in other CPLD patents cited above, including U.S. Pat. No. 5,811,986 which will be specifically addressed herein.
A CPLD device can be characterized as a monolithic, integrated circuit (IC) that has four major features as follows.
Each PLB may have other resources such as input signal pre-processing resources and output signal post-processing resources. The output signal post-processing resources may include result storing and/or timing adjustment resources such as clock-synchronized registers. Although the term `PLB` was adopted by early pioneers of CPLD technology, it is not uncommon to see other names being given to the repeated portion of the CPLD that carries out user-programmed logic functions and timing adjustments to the resultant function signals.
In contrast to LUT-based FPGA's, gate-based CPLD's are generally recognized in the art as having a relatively less-expansive capability of implementing a wide variety of functions, in other words, not being able to implement all Boolean functions for a given input space, but being able to do so at relatively higher speeds. Wide functionality is sacrificed to obtain shorter, pin-to-pin signal delays. Also, because length of signal routings through the programmable interconnect of a CPLD is often arranged so it will not vary significantly despite different signal routings, CPLD's are generally recognized as being able to provide relatively consistent signal delays whose values do not vary substantially based on how partitioning, placement and routing software configures the CPLD. Many devices in the Vantis MACH.TM. family provide such a consistent signal delay characteristic under the Vantis trade name of SpeedLocking.TM.. The more generic term, Speed-Consistency will be used interchangeably herein with the term, SpeedLocking.TM..
A newly evolving sub-branch of the growing families of CPLD devices is known as High-Density Complex Programmable Logic Devices (HCPLD's). This sub-branch may be generally characterized as monolithic IC's that have large numbers of I/O terminals (e.g., Input/Output pins) in the range of about 64 or more (e.g., 96, 128, 192, 256, 320, etc.) and/or have large numbers of result-storing macrocells in the range of about 256 or more (e.g., 320, 512, 1024, etc.). The process of concentrating large numbers of I/O pins and/or large numbers of macrocells into a single CPLD device raises new challenges for achieving relatively broad functionality, high speed, and Speed-Consistency (SpeedLocking.TM.) in the face of wide varieties of configuration software.
A more detailed discussion is provided in the above-cited and copending U.S. application Ser. No. 09/xxx,xxx [Attorney Docket No. VANT 1021] concerning the various operations performed by CPLD configuring software. As such they will not be repeated here except to briefly note the following.
Configuration software can produce different results, good or bad, depending in part on what broadness of functionalities, what routing flexibilities and what timing flexibilities are provided by the architecture of a target CPLD. The present disclosure focuses on the broadness of functionalities and timing flexibilities that are provided by repeated structures referred to herein as macrocell modules.
When confronted with a given design problem, CPLD-configuring software typically cycles through a series of phases, referred to commonly as `partitioning`, `placement`, and `routing`. Differently designed CPLD's can have differently designed PLB's with respectively different, logic-implementing capabilities, and/or timing capabilities. Partitioning software has to account for the maximum size and speed of circuitry that each PLB is able to implement within the specific CPLD device that has been designated to implement the original and whole circuit design.
By way of example, each PLB of a given, first CPLD architecture may be able to generate in one pass (where the one pass does not include the use of a feedback loop) a sum-of-products (SoP) function signal of the expressive form: EQU f.sub.SoP =.SIGMA..sup.N (PTi.sup.Ki/Kmax/L) {Exp. A}.
In this sum-of-products expression (Exp. A), the N factor represents a maximum number of product terms (PT's) that can be generated and thereafter summed by a respective PLB for defining the one sum-of-products function signal, f.sub.SoP. The Kmax factor represents in the same Exp. A, a maximum number of independent, PLB input signals that can be acquired from a set of L available lines. Ki is the number of actual signals that are used as a subset of Kmax for defining a corresponding, i-th product term, PTi. The acquired subset of Ki signals are ANDed together in the respective PLB to define each respective, i-th product term (PTi). If Ki=0, then PTi=0 and that PTi does not contribute to the Boolean sum.
For a more concrete example, consider a PLB of a given first CPLD architecture where each sum-of-products can have a maximum of 3 PT's, with each PT being a product of no more than 16 input terms, where the input terms are sampled from 64 nearby lines. Such a PLB may therefore be able to generate in one pass, a first SoP function of the expressive form: EQU f.sub.SoP1 =.SIGMA..sup.3 (PTi.sup.Ki/16.sbsp.--.sup.max/fm.sbsp.--.sup.64.sbsp.--.sup.Lines){Exp. A1}
Consider also, for purposes of contrast, a PLB of a given second, and differently designed, CPLD architecture where each sum-of-products can have a maximum of 5 PT's, with each PT being a product of no more than 33 input terms, where the input terms are sampled from 174 nearby lines. Such a PLB may therefore be able to generate in one pass, a second SoP function of the expressive form: EQU f.sub.SoP2 =.SIGMA..sup.5 (PTi.sup.Ki/33.sbsp.--.sup.max/fm.sbsp.--.sup.174.sbsp.--.sup.Lines){Exp. A2}.
In other words, due to architectural constraints, it is possible that the one-pass, sum-of-products result (f.sub.SoP1 =PT.sub.1 +PT.sub.2 +PT.sub.3, see Exp. A1) of a PLB in the first CPLD architecture can be no more complex than a sum of three independent product terms (3 PT's), where each such PTi is no more complex than a product of no more than sixteen (16) independent, PLB term input signals that are sampled out of an available and larger set of sixty-four (64) independent signals.
In contrast, and again due to architectural variations, the one-pass, sum-of-products result (f.sub.SoP2 =PT.sub.1 +PT.sub.2 +PT.sub.3 +PT.sub.4 +PT.sub.5, see Exp. A2) of a PLB in the second CPLD architecture can be as complex as a sum of five independent product terms (5 PT's) where each such PTi is as complex as a product of up to 33 independent, PLB term input signals that are sampled by multiplexing from an available and nearby set of 174 independent signals. A yet more concrete example of a CPLD architecture that can be viewed as conforming with Exp. A2 may be found in FIGS. 2A-2B of the above-cited U.S. Pat. No. 5,811,986.
CPLD architectural factors such as the above N, Kmax and L factors can play significance roles in device performance and are matters of delicate design balance. On the one hand, by choosing to use larger absolute values for N, Kmuax and L, a CPLD designer can advantageously provide greater flexibility to the number of options that CPLD configuring software will have as it performs partitioning, placement and routing. On the other hand, if the CPLD designer arbitrarily chooses to increase the values of N, Kmax and L too much, the designer may find that such modifications have led to excessive electrical capacitance or resistance on routing lines and excessive signal processing delays.
One reason why, is because Kmax times L defines a number of crosspoints that will be created for each PLB when the Kmax number of lines of each PLB cross with the L number of adjacent, signal broadcasting lines. The reciprocal of Kmax/L indicates the minimum number of PLB's that will be needed to fully sample all L of the adjacent signals. (L/Kmax times Kmax equals L.) Typically, the CPLD designer will want the CPLD to be able to process all L signals simultaneously (in parallel) so the designer will provide at least a L/Kmax number of PLB's. The same reciprocal ratio, L/Kmax also gives a rough indication of the extent to which the L signal broadcasting lines of the CPLD architecture will be loaded by PIP's (programmable interconnect points). The exact value of loading will depend on the extent to which each set of L times Kmax crosspoints is fully or partially-populated by PIP's.
U.S. Pat. No. 5,811,986 issued Sep. 22, 1998 suggests that the number of input lines per PLB (Kmax) should be kept relatively small (e.g., about 32 input lines per PLB) and that a centralized switch matrix should be employed. This approach has benefits and drawbacks. On one hand, routing decisions for CPLD configuring software are simplified. On the other hand, the complexity of functions which can be generated in each PLB in one pass is limited.
In order to fit partitioning results inside the maximal f.sub.SoP capabilities of each PLB, the partitioning part of CPLD configuring software has to cast its primitive sums-of-products such that they are each equal to or less than the N-defined and Kmax-defined limits of the f.sub.SoP results that can be produced by respective PLB's of the targeted CPLD. If the architecture of the targeted CPLD is such that each of the above-described factors, N, Kmax and L (Exp. A) is relatively large, then the maximal f.sub.SoP results per PLB will tend to be relatively large and the design partitioning phase will be advantageously allowed to work with larger-sized, partition chunks. However silicon resources may be wasted if the to-be-partitioned, original design calls only for small chunks.
In FIG. 9B of the above-cited U.S. Pat. No. 5,811,986, provisions are made for steering some unused product terms, on a per macrocell basis, away from some but not all inputs of the OR gate that initially generates a sum-of-products result, and towards macrocell control terminals. The latter macrocell control terminals couple to an XOR gate, to a clock selector and to the selection control of an asynchronous control selector. A certain number of configuration memory bits are consumed for providing such selective steering-away of product terms (PT's). Such consumption can reduce the effective function depth (e.g., 5 PT's each of up-to 32 input signals) of each PLB and reduce the number of partitioning, placement, and routing options that CPLD configuring software has.
If all goes well in its partitioning, placement, and routing phases, the CPLD configuring software finds a workable `solution` comprised of a specific partitioning of the original circuit into placeable chunks, a specific set of primitive placements of the chunks into specific PLB's, and a specific set of interconnect usage decisions (routings). The software can then deem its mission to be complete and it can use the placement and routing results to generate the configuring code that will be used to correspondingly configure the designated CPLD.
In various instances, the CPLD configuring software may find that it cannot complete its mission successfully on a first try. It may find, for example that the initially-chosen placement strategy prevents the routing phase from completing successfully. This might occur because signal routing resources have been exhausted in one or more congested parts of the designated CPLD device. Some necessary interconnections may have not been completed through those congested parts. Alternatively, all necessary interconnections may have been completed, but the CPLD configuring software may find that simulation-predicted performance of the resulting circuit (the so-configured CPLD) is below an acceptable threshold. For example, signal propagation time may be too large in a speed-critical part of the CPLD-implemented circuit or a given function may use too many passes through feedback paths to generate its result.
In either case, if the initial partitioning, placement and routing phases do not provide an acceptable solution, the CPLD configuring software will try to modify its initial place and route choices so as to remedy the problem. Typically, the software will make iterative modifications to its initial choices until at least a functional place-and-route strategy is found (one where all necessary connections are completed), and more preferably until a place-and-route strategy is found that brings performance of the CPLD-implemented circuit to a near-optimum point. The latter step is at times referred to as `optimization`. Modifications attempted by the software may include re-partitionings of the original circuit design as well as repeated iterations of the place and route phases.
CPLD device users usually do not want to deal with specifics of place-and-route problems. Instead, they simply want to see a fast turnaround time of no more than, say an hour, between the time they complete their original circuit design and the time a first-run CPLD is available to implement and physically test that design.
Beyond merely wanting to implement a specific set of Boolean functions within a given CPLD IC, users of CPLD's also usually insist that the circuit implemented by the CPLD perform according to specified timing requirements. Speed is often as important an attribute as full Boolean correctness and completeness. That is why the user chose to use a CPLD instead of an FPGA.
Aside from speed and full function implementation, users of CPLD's also usually want a certain degree of re-design agility (flexibility). Even after an initial design is successfully implemented by a CPLD, users may wish to make slight tweaks or other changes to their original design. The re-design agility of a given CPLD architecture may include the ability to re-design certain internal circuits without changing I/O timings. Re-design agility may also include the ability to re-design certain internal circuits without changing the placement of various I/O terminals (e.g., pins). Such re-design agilities are sometimes referred to respectively as re-design Speed-Locking.TM. and Pin-Retention (the former term is a trademark of Vantis Corp., headquartered in Sunnyvale, Calif.). The more generic terms of: `re-design Speed-Consistency` and `re-design PinOut-Consistency` will be respectively used herein interchangeably with `re-design Speed-Locking.TM.` and `re-design Pin-Retention`.
In addition to speed, re-design agility, and full Boolean correctness, users of CPLD's typically ask for optimal emulation of an original design or a re-design in terms of good function packing density, low cost, low power usage, and so forth.
Some previous CPLD architectures meshed well with specific bus sizes of specific design problems. However, preferences tend to change over time. Industry standards may, at first, favor designs where address and data words have a size in the range of 8 to 16 bits. Industry standards may later migrate towards larger-sized organizations of signals such as address and data words having sizes in the range of 32 to 64 bits each.
A CPLD that has an architecture optimized for bus-oriented word sizes of 8 to 16 bits may not be able to efficiently accommodate designs where word sizes increase into a range of say, 32 to 64 bits. What is needed is an architecture that can efficiently accommodate dense design problems having word sizes in the range of 32 to 64 bits or more without losing speed and re-design agility. At the same time, if word sizes drop to a lower range for some supplied design problems, and workable solutions can be arrived at with use of relatively simpler circuit chunks, the flexible CPLD architecture should be able to make efficient use of resources that might otherwise go unused because of the drop to the smaller word sizes and/or to simpler partition chunks.