In designing complex digital circuits and systems, the design usually moves through layers of abstraction from the most generally defined design to the final physical implementation, as illustrated in FIG. 1. The designer usually begins by giving a behavioral description of the overall function of the desired part. Languages used for behavioral description are often similar to computer programming languages such as Pascal or C. The behavioral description is then simulated to check for accuracy, and is modified into a structural circuit description, known as a netlist, which represents connections between functional cells in a device that will implement the design. Hardware description languages, such as VERILOG and VHDL, may be used to recite the structural connections. (Some designers prefer to begin the design process at the structural level, using schematics, register transfer schemes or Structural VHDL, thereby bypassing the behavioral level entirely.) Next, the designer develops the physically implementable description of the design. In programmable logic, such implementable files are known as configuration data.
Because many useful digital circuit designs are extremely complex, most designers generate their designs using high level functions that are combinations of hierarchically related subcomponents to facilitate coding and understanding. The design tools then decompose the designer's description into the hierarchically related subcomponents for placing in the chip that will implement the design. The behavioral subcomponents do not necessarily correspond to different parts of a chip or architecture that will implement the functions (e.g., shift registers in one section, addressing circuitry in another), and, in fact, are usually unrelated to physical components that implement a circuit design. Instead, behavioral subcomponents are grouped according to function. It is advantageous to organize a design into hierarchical components, thinking at one time about a high level design, and at other times about details of each portion of the high level design. This technique, often referred to as "divide and conquer," advantageously reduces the possibility of design errors, makes errors easier to correct and accelerates the design process.
The divide and conquer hierarchical design technique provides an additional advantage: dividing a design into hierarchical subcomponents introduces the possibility of reusing part of the design in other designs or in other components of the same design. Using hierarchical subsystems and subcomponents works well with libraries of self-contained modules for accomplishing specified functions and provides some of the advantages found in Object Oriented Programming, such as standardizing subcomponent interfaces and simplifying the editing and substitution processes. It is therefore desirable to do hierarchical design wherever possible.
Existing CAD and EDA software tools exploit hierarchy to a limited extent in the behavioral and structural stages of the design process. For example, schematic-capture-type EDA tools use graphical representations of reusable component symbols, each symbol representing an underlying circuit schematic. The schematic, in turn, comprises other symbols and their underlying circuits. Decomposition may continue through several levels until the most basic and primitive circuit element in the system is encountered.
Similarly, Hardware Description Language (HDL) tools incorporate hierarchy in a manner similar to a software program using subroutines. To represent a desired design, calls are made to predefined procedures which in themselves describe subsystems, and so on.
With both schematic-based and HDL tools, subsystems of a design can be reused multiple times within a single design. Such reuse can be hierarchical as well. For example, a first subsystem cell A may contain two instances of a second cell B, and the cell B may contain three instances of a third cell C. The whole design would therefore contain the equivalent of six instances of cell C. FIG. 2 provides an illustration of this hierarchical design. In such hierarchical designs, each instance of a particular subsystem must be functionally identical to all other instances of that subsystem. If any changes are made to the definition of subsystem B (i.e., the function of the subsystem is changed) then all instances of that defined subsystem are changed.
In the early digital circuit design process, shown as the behavioral and structural stages in FIG. 1, there is a tendency for designers to disregard physical limitations on circuit structure and leave accommodation of such limitations for a later phase in the design process. Indeed, the early design steps are simpler where no account is taken of the constraints imposed by the hardware available for implementation. But such disregard can lead to problems later in the design process. Where hierarchy is easily used during design, fitting the design into a chosen architecture while maintaining a user's design hierarchy is a major challenge. The present invention addresses this challenge.
FIG. 2 illustrates a simple hierarchical design. System A includes two instances of subsystem B. Subsystem B includes three instances of subsystem C. All instances of a subsystem definition are identical. The logical design of FIG. 2 is to be implemented in a physical programmable device such as an FPGA. As the designer moves towards the physical implementation, the idiosyncracies of the hardware may make satisfying the fundamental requirement for uniformity among different instances of subsystems difficult to achieve. For example, the device area available to implement A may not be able to accommodate six uniform implementations of C placed uniformly within two implementations of B. Of course, using a larger physical device and placing subsystems C less densely makes it possible to implement the six instances of C identically, but this leads to increased cost and decreased efficiency. Such inefficiency leads to a considerable increase in configuration costs in terms of memory, processing power and time requirements.
There is therefore a need in the art for a method of physically implementing a hierarchical design so that the implementation is dense and all instances have identical characteristics such as timing. Such a combination of requirements has not been easy to meet, however.
We illustrate the challenge of implementing a hierarchical design in a reprogrammable Field Programmable Gate Array (FPGA). Reprogrammability allows the designer to effect design changes even after implementation. However, FPGAs have limited numbers of routing lines between configurable blocks. A design which has a large number of identical repeated instances of a cell may not fit efficiently into the resources provided by an available FPGA architecture if the hierarchy is maintained, and instances of the same design element may not have identical characteristics if the hierarchy is not maintained.
Conventional design tools and methods fail to address this problem of mismatch that occurs when a design structure is implemented in a physical environment. These tools flatten and remove the structural hierarchy from the design, thereby allowing total design placement without regard for natural boundaries in a hierarchical design or the presence of repeated cells. Flattening a design consists of replacing symbols which represent groups of objects with the objects themselves. A flattened representation of a design requires much more time and processing power to place and route for two reasons. First, instead of finding one location for a group of objects, the software must find locations for each object in the group. Second, there are more suitable locations for small objects, and thus the software must consider more possibilities for each of the small objects. Flattening produces a design with a very large number of design elements and therefore makes the job of placement much more difficult.
FIG. 3 illustrates the disadvantage of flattening (removing) one level of the hierarchy shown in FIG. 2. In this example, the two B nodes have been flattened. That is, the software is told to consider each instance of C separately rather than to consider each group comprising B and its connected cells C as a single unit. It now appears that A contains six instances of C rather than two instances of B. Consequently, structural information from the original design has been lost (although circuit function remains unchanged).
The run time of placement algorithms does not scale linearly with the number of objects to be placed. A placement algorithm which considered all possible placements of n items into m slots where m&gt;=n would consider m!/(m-n)! different placements, e.g., if there are 2 objects to place in any of four slots, then there are 4!/2! different placements or 12 placements to be considered in an exhaustive search algorithm. For complex devices, exhaustive search algorithms are not feasible. For example in a Xilinx XC6216 device there are 4096 function blocks. If a design has 2048 items to be placed in the device, an exhaustive search algorithm would have to consider 4096!/2048! placements, a totally infeasible number of possible placements to evaluate.
Placement algorithms have been designed which try to find as near optimal a placement as possible within a more reasonable time. Known algorithms have reduced the number of placements from factorial complexity to quadratic complexity. Thus only a subset of the total number of placements is considered, but the workings of the algorithm try to ensure that those placements most likely to contain the optimal result are considered. Any given placement algorithm will have a slightly different complexity associated with it. An exact formula for the number of placements associated with a given algorithm is hard to determine. Also, most placement algorithms have a sequence of stages, each of which has a different complexity. Most such algorithms are dominated by one or more phases in which the complexity has an m squared component. So if a device with 100 function blocks takes 1 minute in that phase, then a device with 4096 blocks might take 34 hours (100.sup.2 is roughly 2.sup.13, 4096.sup.2 is 2.sup.24, so the calculation will take about 2.sup.11 minutes or 2048 minutes.) While this is better than a factorial, it is clearly still a very time-consuming calculation. The effect of maintaining the hierarchy is to reduce the number of items in each placement stage. If a placement involving 4096 blocks can be divided into 32 placements of 128 blocks then placement would take approximately 40 minutes, clearly preferable to 34 hours.
Similarly, the difference in the number of nodes that an implementation algorithm would have to visit in the hierarchical graph of FIG. 2 versus the flattened graph of FIG. 3 is considerable. Thus, the high number of cells (C) contained within A in FIG. 3 will lead to a long runtime and increased memory requirements for the steps of placing and routing the design in an FPGA. Losing the hierarchy in a design structure leads to a drastic increase in the computer processing time and speed required to implement (or, in the FPGA environment, map, place, and route) the design.
In implementing the graph of FIG. 2, an available recursive place and route algorithm would begin with the lowest hierarchical level, at the C cells, and would place together three instances of C within the definition of B. Moving up the graph hierarchy, the algorithm would then have to visit and place only two instances of B within the definition of A. In contrast, the same algorithm placing the flattened graph of FIG. 3 would have to visit and place separately six instances of cell C while processing each instance of cell A. As designs become more complex and architectures more dense, the implementation process becomes prohibitively less efficient, more time consuming and demanding of memory resources. It is therefore advantageous to provide an implementation tool which can maintain device hierarchy during design placement.
One available system attempts to address the mismatch encountered between design structure and device architecture by providing a library of implementations of design elements customized for a particular target architecture. U.S. Pat. No. 4,918,440 to Furtek discloses such a system. This approach, often referred to as the "hard macro" approach, is not general in purpose, being constrained by the extent of the supplied library. Also, each structural instance of a cell will be identical to all other instances of that cell, thereby leading to the problems discussed above such as incompatibility with device limitations.
Other available design systems allow the generation of parameterized macros wherein a physical structure is generated by repeating structure and layout information for various basic, repeated cells. Such macros may use hierarchy in their structural or physical description. These systems suffer from the significant disadvantage of requiring each repeating element to have the same physical implementation. It would therefore be advantageous, even in a design system which accommodates parameterized macros, to allow repeating cell definitions that can be densely packed but retain common characteristics.
For additional detailed background materials and discussion of available tools for placing, partitioning and routing digital logic designs in configurable logic, see U.S. Pat. No. 5,448,493 to Topolewski et al., incorporated herein in its entirety by reference.