1. Field of the Invention
The invention relates generally to semiconductor integrated circuits as well as designs and design tools therefor and, more specifically, to techniques for placement of storage stations in into an integrated circuit or design.
2. Description of the Related Art
Timing-critical paths (or nets) of a semiconductor integrated circuit can affect delivery of signals in a way that limits achievable operating frequency. Indeed, one of the significant limiting factors in the design of modern, high-performance microprocessors is interconnect delay. Typically, interconnect delay, rather than logic delay limits operating frequency of circuits designed to operate in the gigahertz range.
A common design technique employed to overcome the limitations imposed by certain timing critical paths, including those that arise due to maxtime constraint violating interconnect delays, is to insert storage elements (e.g., flip-flops or flops), into the timing-critical paths. In effect, the insertion allows the otherwise critical path to retain a signal and deliver the signal in a subsequent clock cycle. In this way, a circuit path whose delay limited operating frequency can be replaced with a pair of half-paths that no longer violate timing constraints, thereby allowing a part or component circuit thereof to operate at higher frequency (assuming no other similarly limiting timing-critical paths).
Flop insertion is a conventional technique commonly employed, typically manually and iteratively, in designs for integrated circuits to improve operating frequency. One technique is to place individual flops into timing-critical paths at the particular point where, if there is an infinite drive buffer, then the buffer sees equal RC delay toward each driver and receiver in both directions.
Unfortunately, in complex integrated circuit designs, the number of timing-critical paths (or nets) can be quite large, numbering in the thousands or more, and placement rapidly becomes intractable, particularly when layout constraints are considered. Because the desired placement of a flop often conflicts with existing blocks of a design (e.g., devices, routing resources, etc.), either flop placements must be constrained by layout of existing blocks and resources or such layouts must be adapted to accommodate the desired placement. In either case, practical and computational requirements (including those associated with extended design and testing cycles) generally conflict with the goal of placing large numbers of flops into a design.
What is needed is a technique that may be efficiently applied to place appropriate delay stages (e.g., flops or other delay stages) into large numbers (often thousands or more) of time-critical nets or circuit paths. Integrated circuit designs prepared or adapted using such techniques, as well as integrated circuits formed in accordance therewith and systems that incorporate such integrated circuits may exhibit improved performance or capabilities.
Accordingly, it has been discovered that by determining a suitable range of positions for introduction of a delay stage into time-critical nets and by aggregating, subsets of the desired delay stages into composite storage stations based, at least in part, on compatibility of respective suitable ranges, delay placement may be efficiently applied in very large integrated circuit designs. For example, in a test case, the problem of placing approximately 2000 flops into time-critical nets of a microprocessor design was reduced to placement of just 128 flop stations. Suitable positional ranges may be determined by performing timing analysis for each net that is routed through the flop station, in general, any of a variety of conventional or commercially available timing analysis tools or techniques may be employed to support such a determination.
Though placement of individual storage elements may be sub-optimal with respect to a given time-critical net, in general, the efficiency of placement can have both implementation-level and design-time benefits. For example, by localizing compatible storage elements in a composite storage station, more compact xe2x80x9cby-Nxe2x80x9d layouts can generally be used rather than xe2x80x9cby Nxe2x88x921xe2x80x9d layouts. Smaller numbers of compact layouts can be efficiently evaluated for placement conflicts with other design blocks, blockage of routing resources, etc. In addition, the overall perturbation of a design (e.g., to accommodate or alleviate placement conflicts with other design blocks, blockage of routing resources, etc.) can be generally reduced with smaller numbers of composite, albeit larger, storage stations. Preferably, at least some of the composite storage stations may be placed, in whole or in part, in pre-existing free space. In this way, placement of storage elements can be exploited as a design technique in complex, high-speed semiconductor designs (e.g., gigahertz range microprocessors) at a scale generally unachievable using conventional techniques. In some realizations, determination of suitable placement ranges, grouping of desired storage elements into composite storage stations and placement of such storage stations in a semiconductor design may be partially or completely automated.
Some or all of the aforementioned benefits may be achieved in a given exploitation of the techniques described herein. Some, none or all of these benefits may appear in, or result from, any method, apparatus, article or system corresponding to any particular one of the appended claims.
In some embodiments in accordance with the present invention, a method for use in an integrated circuit design is described. The method includes identifying a range of suitable positions for insertion of a storage stage for timing-critical circuit paths, determining storage stations that each span a respective collection of the timing-critical circuit paths based at least in part on compatibility of the ranges of suitable position and introducing respective storage stages for the timing-critical circuit paths in the storage stations. In some variation, a substantial number of the timing-critical circuit paths spanned by a particular storage station exhibit generally uncorrelated timing requirements.
In some variation, the range of suitable positions are identified at least in part by identifying one or more positions on each one of the timing-critical circuit paths having substantially equal RC delay towards a first and a second ends of respective one of the timing-critical paths. In some variation, the one or more positions are identified at least in part by substantially using a predetermined generic timing model. In some embodiment, one or more of the storage stations are determined at least in part according to a predetermined storage station library. In some variation, the range of suitable positions comprises at least one continuous range of positions along respective timing-critical paths.
In some embodiment, the range of suitable positions includes multiple discrete positions along respective timing-critical paths. In some variation, the timing-critical paths require more than one clock cycle to transfer data. In some variation, the storage stations are determined substantially without rerouting the timing-critical paths spanned thereby. In some embodiment, the method includes merging the storage stations into one or more compact storage stations.
In some embodiment, a semiconductor integrated circuit is described in which storage elements are coupled into adjacent timing-critical circuit paths that, during operation, bear generally uncorrelated signals and in which the storage elements are grouped into composite storage stations that share at least some circuitry, wherein placement of at least a substantial number, more than half, of the individual storage elements of a composite storage station is suboptimal with respect to at least some of the respective timing-critical circuit paths.