The present invention relates to the design of integrated circuits having clock signal distribution networks (clock trees) and clock gating circuitry, and more particularly to methods for making energy efficient integrated circuits (IC) with gated clock trees.
Typical integrated circuit (IC) chips may contain hundreds of thousands or millions of transistor elements, plus wires and other elements such as resistors and capacitors to implement their logical functions. Additionally, an IC chip may contain a xe2x80x9cclock treexe2x80x9d or xe2x80x9cclock net,xe2x80x9d comprising a network of wires and buffers and clock gating elements, that distributes and/or restricts the xe2x80x9cclock signalxe2x80x9d that controls the timing and operation of portions of the logical elements of the IC. When designing integrated circuits, a netlist description (model) of the integrated circuit is generated. The netlist includes a description of the integrated circuit""s logical components and the connections or networks (xe2x80x9cnetsxe2x80x9d) between the components. The components include all those circuit elements necessary for implementing the logic circuit, such as combinational logic (e.g. gates) and sequential logic (e.g., flip-flops and latches). The logic elements and other circuits controlled by a clock signal such that they add capacitance to the clock tree are generally referred to herein as xe2x80x9cclock sinksxe2x80x9d or xe2x80x9csinks.xe2x80x9d The sequential logic elements (i.e., sinks), such as flip-flops, RAMs, dynamic logic gates, and latches, that are xe2x80x9cclockedxe2x80x9d by the clock structure of the circuit are usually described in the netlist, although their connections with the clock tree are usually omitted from the netlist during placement in the related art. The netlist descriptions of the related art generally do not include a description of the clock tree components, connections or nets for logic circuit placement decision purposes. The physical design process of integrated circuits has been traditionally performed in three separate operations: logic placement; clock tree optimization; and wiring. Traditionally, placement is the assignment of logic circuits in the netlist to locations (called xe2x80x9ccellsxe2x80x9d or xe2x80x9cbinsxe2x80x9d), on the chip image. Traditionally, the connections within clock trees, such as clock signal connections to clock buffers, clock gates, and clock sinks are only xe2x80x9coptimized after placementxe2x80x9d of the logic circuits has been completed. See, e.g., CIRCUIT PLACEMENT, CHIP OPTIMIZATION, AND WIRE ROUTING FOR IBM IC TECHNOLOGY, IBM Journal of Research and Development, (Volume 40, Number 4, July 1996). Wiring is the generation of routes between circuit elements, using the available interconnection layers, to complete the connections specified in the final netlist.
The clock signal is the most fundamental control signal in a digital circuit, and is usually required to be transmitted to all regions of the IC chip that it controls. Buffers are generally used in the clock signal distribution tree to amplify and retransmit the clock signal where long thin wires spanning the IC chip would otherwise tend to slow or attenuate the propagation of the clock signal. As the rapidly developing field of low power integrated circuitry advances, the number of transistor logic elements per unit of chip surface area continues to increase. As the integrated circuit density on a chip increases, the amount of power consumed and heat generated per unit of area by the integrated circuits on the substrate increases proportionally. The integrated circuit industry has changed from TTL to Complimentary Metal Oxide Semiconductor (CMOS) technology in order to decrease the current consumption, thereby reducing power consumption and heat generation. CMOS logic circuits consume power when they are switched between logical states, such as by a clock signal. The power consumption of CMOS elements decreases in proportion with a decrease in switching frequency.
In typical Integrated Circuit (IC) designs, e.g., Application Specific Integrated Circuit (ASIC) or microprocessor designs, the clock signal distribution network or xe2x80x9cclock treexe2x80x9d of the related art can consume from 20% to 80% of an integrated circuit""s total active power. As the clock signal and its related circuitry may be a large power consuming factor within most microprocessor systems, one important technique for reducing power consumption in microprocessor designs is to reduce the power consumption of a microprocessor""s clock signal distribution network (e.g., xe2x80x9ctreexe2x80x9d) by splitting the clock signal into several separate clock signals that can be individually disabled or xe2x80x9cgated offxe2x80x9d when the logical portion (e.g., xe2x80x9cdomainxe2x80x9d) of the circuit it controls does not need to be clocked. The logical portion of the circuit that is controlled by the clock signal that is gated off by a particular clock gating signal is called a xe2x80x9cclock gate domainxe2x80x9d or xe2x80x9cdomain.xe2x80x9d
The process known as xe2x80x9cclock gatingxe2x80x9d, disables the clock signals fed to logic blocks (i.e., xe2x80x9cdomainsxe2x80x9d) of the circuit when the logic blocks (i.e., domains) are not currently in use by the circuit (e.g., microprocessor). Without clock-gating, power is consumed by every sink during every clock cycle. Power consumption due to the clocking of logic blocks that are not directly involved with the current operation of the microprocessor may be reduced by clock gating. Clock gating techniques of the related art require additional logic (e.g., clock gating logic) circuitry to generate the clock gating signals and also gates within the clock tree to gate the clock signal in each domain.
If a plurality of logically non-equivalent gated clock domains overlap the same physical region of the chip, the total clock tree capacitance can increase substantially, due to the overlapping and separate clock-gating circuitry and domain wiring. This increased capacitance can increase power consumption so much that any reduction due to clock gating is cancelled out. Conversely, if the sinks gated by a particular gated clock signal are forced into an exclusive physical region not overlapping any region occupied by the sinks controlled by other gated clocks, clock tree capacitance may be reduced, but significant skew, delay or wireability problems may be created.
In order to have a power savings, the clock gating logic circuitry must consume less power than is saved by gating the clock signals off. Therefore, net reduction of power consumption by clock gating is a balancing function of the power consumed by the added clock gating circuitry and wires and the power that would be consumed by leaving a domain or a subdomain (i.e., a portion of a domain) of the clock tree either ungated or less than maximally gated.
Strategies for defining logic blocks (i.e., xe2x80x9cclock gate domainsxe2x80x9d) that can be clock-gated and strategies for identifying and/or generating the clock gating control signals that perform the clock gating are known to persons skilled in the art. The ideal clock signal distribution tree has the smallest number of clock gates that yield the maximum amount of clock gating power savings when running typical application code. The degree of optimization of clock distribution trees is generally limited in the related art by the constraints imposed by logic circuit placement that has been completed without regard for clock tree optimization considerations.
Traditionally, the connections and nets within the clock tree have been zero-weighted or omitted in models and/or placement netlists so that those connections and nets do not influence the placement of clock sinks or other logic circuits. Traditionally, clock optimization tools are employed only after placement, to perform optimization of clock tree nets, such as by gating the nets of domains and/or subdomains, interchanging sinks of equivalent nets, creating and moving parallel copies of clock buffers and/or gates, adding load circuits to balance clock net loads, and generating balanced clock tree routes. In general, the clock-gating strategies of the related art attempt to optimize clock tree structures by intelligently distributing clock gates, wires, and buffers only after the clock tree topology has been constrained by the placement of the clock-controlled logic circuits without regard for clock tree efficiency concerns. After placement, the strategies of the related art make modifications to the number and arrangement clock signal buffers, of clock signal splitters, of clock signal gates and other clock tree elements, modifications to the connections between these clock net elements, and modificatons to the location of these connections and elements. The goals of these after-placement clock tree optimizations include to reduce the total length of connections (wire) in the clock tree and to reduce or control the skew between the clock arrival times at various clock tree sinks. The problems of this after-placement approach include that the power savings to be obtained through clock-gating is arbitrarily and non-optimally limited by the relatively random distribution of sinks of each clock gate domain across the whole chip.
At the other extreme, U.S. Pat. No. 6,020,774 to Chiu, teaches a method xe2x80x9cof synthesizing a gate array logic circuit,xe2x80x9d (a very simple ASIC circuit), to minimize the power consumption of its clock tree by forcing (i.e., xe2x80x9cgroupingxe2x80x9d) all logic elements (e.g., latches) controlled by one gated clock signal (e.g., of one clock gate domain) into exclusive physical regions on the chip. This simplistic forcing method for optimizing xe2x80x9cgate array logic circuits,xe2x80x9d taught by 6,020,774, is not practical for use in Large Scale Integrated (LSI) circuits and Very Large Scale Integrated (VLSI) circuits where significant distances may need to be spanned to communicate signals between two or more logic elements of two or more clock gate domains, or where the clock-gating control signal of one domain-region is generated by logic circuitry located in another domain-region. Forcing all sinks (e.g., logic elements) that are clock-gated alike (e.g., in the same domain) into exclusive physical proximity (i.e., xe2x80x9ctogether as a collective unitxe2x80x9d as in the method of 6,020,774,) without regard for the impact on wiring overhead of connections outside the clock tree itself is very likely to create or aggravate skew, delay, and/or wiring problems in Large Scale Integrated (LSI) circuits or VLSI circuits, and may result in inoperable or unreliable circuits.
In certain integrated circuits, clustering a domain""s sinks while ignoring the layout of clock-gating logic connections and/or connections between clock-controlled logic in different clock domains can generate a wiring overhead that consumes more power than is gained by an optimized clock gating strategy.
Accordingly, a need exists for methods to reduce the power consumption of integrated circuits through clock power optimizing logic circuit placement without introducing or aggravating skew, delay, and/or wiring problems outside of the gated clock tree itself.
To overcome the deficiencies of the related art, a first aspect of the invention provides a method for synthesizing a logic circuit that is driven by a clock signal, and that has a plurality of clock domains each having a plurality of clock sinks, the method for synthesizing a logic circuit that is driven by a clock signal, and that has a plurality of clock domains each having a plurality of clock sinks, the method comprising:
providing a semiconductor substrate;
placing all of the plurality of clock sinks of one domain into at least one cluster of clock sinks on the semiconductor substrate, wherein the sink-density of each cluster of clock sinks is approximately equal to or greater than the clock sink density of the integrated circuit.
A second aspect of the invention provides an integrated circuit, that is driven by a clock signal, comprising:
a semiconductor substrate; and
a plurality of clock domains each having a plurality of clock sinks;
wherein the plurality of clock sinks of one domain forms at least one cluster of clock sinks on the semiconductor substrate, wherein the sink-density of each cluster of clock sinks is approximately equal to or greater than the clock sink density of the integrated circuit.
A third aspect of the invention provides a method for creating an integrated circuit having a clock signal distributed by a gated clock tree having a plurality of clock domains each having a plurality of clock sinks, comprising the steps of:
creating a model of the integrated circuit, wherein the model includes a netlist;
establishing at least one target condition for a domain before placement is completed;
making a determination of the extent to which a domain is in compliance with the target condition; and
making a placement refinement based upon the determination.
A fourth aspect of the invention provides a method for creating an integrated circuit, comprising the steps of:
providing a logic circuit design;
providing a clock tree design having clock sinks in common with the logic circuit design;
defining the clock gate domains of the clock tree optimally for power efficiency based upon physical placement information prior to completion of placement; and
optimizing the logic circuit and the gated clock tree together during placement.
A fifth aspect of the invention provides a method for creating an integrated circuit, comprising the steps of:
providing a semiconductor substrate;
providing a logic circuit design;
providing a clock tree design having clock sinks in common with the logic circuit design;
performing a placement refinement providing information about the location of the clock sinks on the substrate; and
performing an after-placement-type clock tree optimization method before placement is complete.
The first, third, fourth, fifth method aspects of the invention may be combined or performed separately and/or individually.
A computer program product also is provided having a computer readable medium with program code for performing the first, third, fourth, fifth method aspects, and for producing the chip of the second aspect of the invention. The inventive program product is carried by a medium readable by a computer (e.g., a carrier wave signal, a floppy disc, a hard drive, a CD-ROM, a random access memory, etc.). The computer readable medium comprises program code for performing the first, third and/or fourth aspects of the invention. An embodiment of the invention in a computer readable medium having:
program code for creating a model of the integrated circuit, wherein the model includes a netlist;
program code for establishing at least one target condition for a domain before placement is completed;
program code for making a determination of the extent to which a domain is in compliance with the target condition;
program code for making a placement refinement based upon the determination.
Other features of the present invention will become more fully apparent from the following detailed description of the preferred embodiments, the appended claims and the accompanying drawings.