1. Field of the Invention
The present invention relates generally to methods and systems used to create efficient physical implementations from high level descriptions of electronic designs and, in particular, to a software system and method that optimizes Register-Transfer-Level (RTL) descriptions with respect to performance parameters including area, timing, and power, prior to logic synthesis, floorplanning, placement and routing.
2. Description of the Background Art
Present Electronic Design Automation (EDA) systems for designing electronic systems consist of software tools running on a digital computer that assist a designer in the creation and verification of complex electronic designs. Present day state-of-the-art design technique uses a combination of logic synthesis, floorplanning, place-and-route, parasitic extraction, and timing tools in an iterative sequence to form a design process commonly known as the top-down design methodology.
The left side of FIG. 1 illustrates a typical top-down design process. The primary entry point into the top-down design flow is a high level functional description, at behavioral-level or RTL, of an integrated circuit design expressed in a Hardware Description Language (HDL). This design is coupled with various design goals, such as the overall operating frequency of the Integrated Circuit (IC), circuit area, power consumption, and the like.
Conventional top-down methodology uses two overlapping processes, a front-end flow, and a back-end flow. Each of these flows involve multiple time consuming iterations, and the exchange of very complex information. In the front-end of the top-down methodology, the RTL model is manually partitioned by the designer into various functional blocks the designer thinks best represents the functional and architectural aspects of the design. Then, logic synthesis tools convert the functional description into a detailed gate-level network (netlist) and create timing constraints based on a statistical wire-load estimation model and a pre-characterized cell library for the process technology that will be used to physically implement the integrated circuit.
The gate-level netlist and timing constraints are then provided to the back-end flow to create a floorplan, and then to optimize the logic. The circuit is then placed and routed by the place-and-route tool to create a physical layout. After place-and-route, parasitic extraction and timing tools (typically by the circuit fabricator) feed timing data back to the logic synthesis process so that a designer can iterate on the design until the design goals are met.
While the synthesis and place-and-route automation represent a significant productivity improvement over an otherwise tedious and error-prone manual design process, the top-down design methodology has failed to produce efficient physical implementations of many circuit designs that take fall advantage of the capability of advanced IC manufacturing processes. This is evident in the growing xe2x80x9cdesign gapxe2x80x9d between what semiconductor vendors can manufacture with today""s deep sub-micron processes and what IC designers can create using top-down EDA design tools. The latest 0.18 xcexcm CMOS process can fabricate silicon die with 10 million gates, running at speeds in excess of 500 MHz. In contrast, designers using conventional top-down EDA tools struggle with the creation, analysis, and verification of integrated circuits having 0.5-1 million gates, running at 150 MHz.
The primary inefficiency of the top-down methodology arises from its reliance on statistical wire-load models proved to be inadequate in wire-delay dominated deep sub-micron digital systems. Timing in deep sub-micron integrated circuits is dominated by interconnect delays rather than gate delays. Conventional top-down design tools, such as behavioral and logic synthesis, were originally designed in an era when gate delays dominated chip timing. These tools use inaccurate, statistical wire-load estimates to model wiring parasitics at early stages in the design cycle, and the effects of these inaccuracies are propagated throughout the rest of the design methodology. To overcome the timing model inaccuracies, the designer engages in excessive and time-consuming iterations of logic synthesis, floorplanning, logic optimization, and place-and-route in attempting to converge on the timing constraints for the circuit. This iterative loop is referred to as the timing-convergence problem.
The large discrepancy between statistical wire-load model and actual wire-load means that circuit designers must wait until gate-level floorplanning and place and route tasks are complete to begin chip-level optimization. The enormous gate-level complexity of today""s system-on-a-chip designs places a heavy burden on gate-level verification and analysis tools and makes multiple design iterations very time consuming.
Additionally, the complexity of present high performance integrated circuit designs overwhelms the capability of logic synthesis tools. Synthesis execution times of many hours on present day high-performance engineering workstations are typical for circuits containing only tens-of-thousands of logic gates. Place-and-route execution times for these circuits can also consume many hours. It is not unusual for a single synthesis and place-and-route iteration for a circuit containing tens-of-thousands of logic gates to take days. Synthesis and place-and-route tool run times grow non-linearly, sometimes exponentially, as the size of the circuit grows and as circuit-performance goals are increased. Thus, logic synthesis cannot process complex designs all at once. Designers are forced to develop functional descriptions and manually partition the design into smaller modules, upon which logic synthesis is individually performed. During manual partitioning, however, the designer has little or no accurate information on the back-end physical effect of the partitioning, and in particular, on the effect of such partitions on timing, area, and power consumption. The relationship between high-level functional description and the low-level layout physical effect is not obvious at the front-end design stage. The failure to predict accurate back-end physical effect at or above the RTL design stage results in local optimization and a sub-optimal functional description of the design. Design efficiency suffers due to design over-constraint (timing non-convergence) or under-constraint (loss of performance and density), or some combination of both for various different partitions of the integrated circuit. Sub-optimal RTL descriptions and partitioning serve as a poor starting point for logic synthesis, which propagates and amplifies the design deficiencies, eventually leading to silicon inefficiency (e.g., excessive area or power consumption, slower operating frequency), even after long iteration and manual intervention.
Further inefficiency in the top-down design methodology is introduced because logic synthesis tools treat all logic as random logic. Consequently, logic synthesis typically fails to recognize and take advantage of more efficient silicon structures such as datapaths, which are commonly used and expressed in the high level description of the design. Designers who recognize this limitation frequently bypass synthesis by manually instantiating gate-level elements in their RTL source. This is equivalent to writing a gate-level netlist, an onerous, low-productivity, and error-prone task.
Another deficiency of the top-down methodology is that it requires a cumbersome netlist hand-off between front-end and back-end design cycles. Complex bi-directional information transfer occurs at the overlap between front-end and back-end iteration loops. The diverse design expertise required to effectively manage the top-down design process is rare and not commonly available to a typical design team. Design inefficiency causes the costly under-utilization of advanced IC manufacturing processes. The iterative nature of the top-down design methodology requires long design time and large design teams, often not available or even feasible in a competitive design environment characterized by short product life-cycles and short time-to-market requirements. Thus, achieving rapid timing convergence while satisfying density, power, and productivity constraints for high performance complex systems is a daunting challenge facing the electronic design industry today.
Accordingly, there is a need for an EDA system that improves the present top-down methodology in performance, density, power, and design productivity. In particular, there is a need for a software method and system that optimizes the design of an integrated circuit at the RTL stage, prior to conventional logic synthesis, floorplanning, and place-and-route design stages.
The present invention overcomes the limitations of the conventional top-down methodology with an RTL optimization system and method that enhances existing top-down EDA systems by implementing an automatic performance-driven design paradigm. The RTL optimization system of the present invention implements automatic hierarchical structured custom design and delivers significant improvements in performance, density, power, and productivity over the existing top-down design methodology. The RTL design methodology of the present invention enables the user to enter, analyze, debug, optimize, and implement their designs by working exclusively with RTL models before logic synthesis. Full-chip design, analysis, and optimization run orders-of-magnitude faster than conventional gate-level tools, thereby enabling truly interactive design.
The RTL design methodology and system of the present invention uses placement based wire load models to capture the performance characteristics of the known physical implementations of individual partitions of an electronic design, and of the overall electronic design itself, prior to any logic synthesis. This performance data is used to optimize the partitioning, floorplanning, and routing of the electronic design in order to find a known solution to design goals. This solution defines the physical implementation of the electronic design at the partition and chip level and thus constrains the back-end flow so that only a single pass through conventional logic synthesis, place-and-route, and so forth is required.
In a preferred embodiment, the hand-off between the RTL optimization system and the conventional back-end flow includes the RTL model along with chip and block level netlists, floorplans, routing, aspect ratios and areas, pin assignments, output loads, input, output and internal timing constraints, placement based wire loads for wires within and s between partitions, and command scripts for controlling back-end tools. In this fashion, the back-end flow can be fully constrained to a single pass, thereby accomplishing true RTL level hand-off.
More particularly, placement based wire load models are used throughout the RTL optimization process to characterize the performance of logic structures, partitions, and the overall chip or electronic design. This performance characterization of the timing, area, power, and other performance attributes is used to optimize the electronic design at the RTL level. This feature eliminates the conventional requirement of logic synthesis, floorplanning, and routing normally needed to capture the performance characteristics of the physical implementation. Another feature of the present invention is the ability to fully characterize the performance of a logic structure using performance data of a number of physical implementations of the logic structure derived from a placement based wire load model.
Yet another feature of the present invention is the generation of such performance data for a variety of a physical implementations to create a fully characterized library, here called a library of logic building blocks or xe2x80x9cLBBsxe2x80x9d. A LBB is a high level, technology-independent description of a logic structure that has performance data fully characterizing its performance envelope over a range of different physical implementations. The performance data preferably quantifies the relationship between the area, circuit delay, and output load of the logic structure for a number of different physical implementations. This performance data is created by placing and routing each physical implementation to create a placement based wire load model. The performance data may be characterized further for both random logic and datapath implementations. In addition, the performance data preferably defines these area, timing and output load relationships for each of a number of bit widths, and a number of driver sizes for various typical loading conditions. A LBB may have multiple implementations representing different area and speed tradeoffs. The performance data of a LBB for these different physical implementations thus defines its entire performance envelope. LBBs range from simple gates (inverter, NAND, latch, flip-flop) to complex logic structures such as adder, finite state machine, memory, and encoder. The use of LBBs elevates the pre-characterized library approach from the conventional gate level to a complex-structure module level, and allows the accurate performance data which characterizes the LBB to be used at the RTL design level to optimize the partitioning and floorplanning of the electronic design.
Another feature of the present invention is the filly automatic partitioning of the RTL model and subsequent automatic refinement of the partitions during chip optimization. Automatic partitioning creates partitions that optimize the local and global floorplanning, routing, timing and so forth, using the placement based wire load information. A high level chip optimization process can induce repartitioning to move logic between partitions, combine or split partitions as needed to meet design goals and generate timing and other constraints. This automatic process removes the burden from the designer of having to manually partition the design and allocate timing between partitions, only to find from the subsequent back-end flow that such timing allocations and partitions are either infeasible or suboptimal.
The right side of FIG. 1 illustrates the overall design flow in accordance with the present invention. Beginning with an RTL model of an electronic design, the present invention first automatically partitions the RTL model into a number of physical partitions. This automatic partitioning transforms the logical hierarchy of functionality inherent in the RTL model into a physical hierarchy optimized for the chip-level physical implementation. The partitions are optimized to select local physical implementations given the current design goals. Chip optimization, including floorplanning, pin assignment, placement and routing, refines the partitioning, and enables simulation and analysis of timing for the entire chip, and generates additional design constraints. These constraints are fed back through the partitioning and optimization phases to finally converge on an overall timing and area solution. Because this entire process takes place without relying on the gate-level logic design of the conventional top-down approach, many fast iterations through this process enables a large range of different physical implementations to be quickly explored to automatically converge on the optimal physical implementations which satisfies the design goals, typically without the need for intervention or assistance by the designer. A simplified RTL level hand-off along with the generated design constraints is passed to the back-end flow, which now goes through only a single pass to fabricate the circuit design.
In a preferred embodiment, the design methodology and system of the present invention takes an RTL model source and converts it to a network of LBBs that efficiently represent a desired hardware implementation.
The LBB network, and hence the RTL model, is then automatically partitioned into a number of physical partitions, such as datapath, finite state machines, memories, hard macro blocks, and random logic partitions. This functional partitioning transforms the logical hierarchy of functionality inherent in the RTL model into a physical hierarchy optimized for the chip-level physical implementation. The physical hierarchy defines both the connectivity and hierarchical relationships of the partitions.
For each of the physical partitions, a number of feasible block-level physical implementations are modeled automatically. A physical implementation is feasible for a partition if it meets timing and other design constraints defined for the partition, including at least a minimum operating frequency for the entire chip. The implementation model data is extracted from the performance data included in the LBBs of the physical partitions and the placement-based wire-load model of the partition. The range of feasible implementations for a partition will likely vary in area, aspect ratio, timing, and power consumption. Each implementation model includes a pin-to-pin timing model, a placement based wire load model for the partition, and a block-level floorplan with pin assignment.
The next automatic process is a chip-level optimization which produces a first-pass floorplan of the integrated circuit and a set of chip-level design constraints for block-level partitioning refinement. The chip-level optimization uses the feasible block-level implementation models for all partitions, design constraints on chip area, aspect ratio, operating frequency and I/O signal timing, and a chip-level netlist for partition connectivity. Chip-level optimization iterates through the implementation models and performs floorplan creation and compaction, pin assignment, global routing, and global timing analysis.
After the first pass floorplan is generated, the partitions of the floorplan are further optimized based on the refined design constraints derived from chip-level optimization using structural partitioning. Structural partitioning may include moving LBBs between partitions to improve timing, or merging partitions into larger units, breaking partitions up into smaller units, or changing a partition""s architecture type (e.g. from a datapath to a random logic partition) to improve packing density. Structural partitioning produces new block-level constraints for datapath and non-datapath partitions which improve timing and floorplan packing density.
New partition implementation models based on refined constraints, along with the other data of the chip design are reintroduced to the chip optimization process for a second and final optimization pass. This second-pass includes a final selection of a physical implementation of all partitions, floorplanning, pin assignment, and global routing.
To interface with conventional back-end process tools, the present invention provides detailed implementation constraints, including an optimal floorplan and placement-based wired load models at chip and block-level. These implementation constraints preferably include partitioning constraints, including a structural RTL netlist for each physical block and top level connectivity; physical constraints, including area, aspect ratio, pin assignment, global wire routing path, and floorplan (chip and block-level); and timing constraints, including output load, input arrival time, output timing constraints, operating frequency, and placement-based wire load models; and command scripts.
In conventional top-down design, the front-end flow at best predicts the timing and area results to be generated by the back-end flow. In contrast, in the present invention, the final set of design constraints from the second-pass chip optimization guarantees a known solution to timing convergence. This is because accurate placement-based wire-load has been used throughout the optimization process and the implementation of individual partitions has been proven feasible. Multiple rapid internal iterations between chip-level and block-level optimization ensure that design constraints for driving the back-end implementation are well-balanced and optimal. These block-level constraints represent a recipe to meet area and performance goals in a single pass through the backend process, and therefore serve as an effective interface between front-end and back-end implementation in a RTL hand-off design flow.
The present invention supports the above design flow as a built-in, pre-programmed sequence designed to reach timing convergence in a single pass through the back-end automatically for a majority of IC designs. In addition, the present invention provides facilities for manual interventions to refine the automatic result. The built-in optimization sequence can also be modified by the user to adapt the system to unique chip requirements. Manual entry points include control of physical hierarchy construction, control of LBB synthesis, partitioning, pin assignment, floorplan (block and chip-level), creation and selection of block level implementations, in-place optimization, and back-annotation.
The present invention provides numerous advantages over conventional top-down EDA design systems. First, because the RTL timing and power analyses use accurate placement-based wiring parasitics instead of unrealistic statistical wire-load estimates employed by many of today""s tools, optimization of the circuit design is possible prior to logic synthesis. This eliminates the multiple design iterations following logic synthesis (or the custom manual design) common with deep sub-micron designs.
Second, RTL analyses of the present invention run at interactive speeds, enabling micro-architecture optimization. The use of LBB and bus representation raises the design abstraction above the conventional bit-wise gate-level representation of a circuit to simplify and accelerate design representation, analysis, and visualization. Since the design flow is completely performance driven, altering the high level constraints (area, timing, power) will result in vastly different chip implementation. Thus, the designer is immediately able to alter the design at any stage of the design flow to test out various alternate designs. This encourages design exploration in a manner not possible with conventional EDA tools.
Third, hierarchical partitioning of the RTL model into efficient silicon structures, such as datapath and complex libraries, can be performed automatically, thereby reducing the time and expertise required to implement efficient design.
Fourth, links to back-end tools may be built to filly automate gate-level optimization and physical implementation. Likewise, links to front-end tools may enable improved behavioral synthesis based on more accurate parasitics and timing estimates.
Fifth, the high-level LBB representation and cross-probing capability between multiple design views provide traceability across multiple design transformations and enable the use of the user-defined RTL model as the xe2x80x98goldenxe2x80x99 source throughout the design process. This feature of the present invention is found in the user interface of the RTL optimization system. While the RTL optimization system dramatically restructures and modifies the architecture of the RTL model, the system designer""s original source RTL files are preserved as a functional interface for analyzing and probing the electronic design. The designer can thereby identify familiar RTL objects and trace their instantiation through any of the partitions, LBBs, or other entities created by the RTL optimization system.
To facilitate this feature, there is displayed both the logical hierarchy of the RTL and the physical, extracted hierarchy of the electronic design as created by the RTL optimization system. Also, block level diagrams of the LBB network are presented. The user interface windows for the RTL source, block diagrams, physical and logical hierarchies, floorplan, and timing, are linked together so that the designer can cross-probe RTL objects, LBBs, signals, components, variables, and the like at any level of the electronic design, and from any window.
Finally, the present invention essentially provides designers an interactive xe2x80x98virtualxe2x80x99 back-end environment which models physical effects and implementations, thereby enabling front-end micro-architectural optimization at the register transfer level before synthesis. The system automatically searches the solution space and derives an optimal solution for rapid timing convergence. It then generates all necessary data to drive back-end tools to implement that solution. The ability to achieve better silicon efficiency predictably and rapidly, while de-coupling the front-end loop and streamlining the back-end loop, enables a more productive RTL hand-off design paradigm.