The invention relates to the design of CMOS integrated circuits. In particular, the invention relates to automatic resizing of devices and selective substitution of low-threshold devices into CMOS circuits to optimize speed, circuit capacitance and power dissipation.
Many CMOS processes involve threshold adjustment implant steps. These threshold adjustment implants typically involve masking operations, such that the N and P type device thresholds may be independently adjusted.
It is known that N-type devices having reduced thresholds may be fabricated by selectively blocking part of the N-type threshold adjustment implant while fabricating N type devices. Similarly, reduced threshold P-type devices may also be fabricated.
A particular example process provides transistors having gate lengths of about a tenth micron, nominal N-type threshold voltages of about 0.3V for a low threshold Reduced-Vt transistor, and about 0.35V for a normal threshold transistor. This process also provides P-type Reduced-Vt devices having a threshold of about xe2x88x920.31V and normal threshold P-type devices having threshold voltage of about xe2x88x920.365 V.
On the example process, saturation currents of the Reduced Vt devices tend to be about twenty percent higher than normal threshold devices.
Enhancement, Reduced-Vt, and intrinsic device types are often used together in the design of analog circuitry and special-purpose digital circuitry. For example, a Reduced-Vt device used as a source-follower offers slightly better headroom than an Enhancement device; and a Reduced-Vt device requires less bias voltage than an Enhancement device when used as a capacitor. An N-type Reduced-Vt device source-follower may also be used in parallel with the P-type pullup of a digital clock-driver.
Typical digital signal levels can not be guaranteed to completely turn off typical Reduced-Vt devices; in effect these devices leak more than their normal Vt counterparts.
On the example process, device leakage of the Reduced Vt devices is about ten times higher than that of normal threshold devices, and may reach or exceed two microamps per micron of gate width at high temperatures. This can produce substantial leakage current if a large percentage of transistors on a large integrated circuit, such as a modern processor integrated circuit, are of the Reduced-Vt type.
Gates built of Reduced-Vt transistors can therefore be referred to as a fast-but-leaky gate type, and those of standard thresholds as slow-but-not-leaky gate type.
It is known that the effective source-drain resistance of a CMOS transistor used as a switching device in a logic gate is strongly dependent upon the difference between its gate-source voltage and its threshold voltage
An N-type enhancement pulldown transistor having one volt gate-source will therefore conduct significantly less current than an N-Type Reduced-Vt device of the same size and having the same gate-source voltage. On an example process, this current may be twenty percent higher for Reduced-Vt devices than for normal devices. For this reason, Reduced-Vt devices have been used in speed-critical logic circuits where timing requirements can be met in no other way.
Threshold voltage can also be effectively increased, and leakage substantially reduced, by increasing device length, with consequence of increased gate capacitance and reduced IDSat (hence reduced speed). Even a small increase in length can substantially reduce leakage. A CMOS design may use gates with normal L""s for speed where necessary, and gates with slightly greater L""s where lower leakage is important. The normal L devices may also be termed a fast-but-leaky type and the greater L devices as slow-but-not-leaky type. For example, Transistors on a 0.1 u process could have 0.1 u L when high speed is needed, while they could be ten percent longer when lower leakage outweighs the speed disadvantage.
It is also known that effective threshold voltage of MOS transistors in logic circuits may be adjusted by applying substrate or well bias. Variation in threshold with substrate bias is known as the body effect. For n-channel transistors, the conventional substrate bias is 0V, and for p-channel transistors the conventional bias is the local power supply voltage VDD. If the N type bias is increased to a level above circuit ground, Vt can be reduced a little at the expense of increased junction capacitance. Similarly, if N type bias is decreased to a level below circuit ground, Vt can be effectively increased and junction capacitance decreased. P-channel transistors are similarly affected, although polarity is reversed.
Standard CMOS N-well processes lend themselves readily to application of bias to wells, and thereby to P-type transistors. Other processes may be adaptable to application of bias to N-type transistors. For purposes of this patent, devices having transistors with bias such that the absolute value of threshold voltage is reduced are also termed a fast-but-leaky type and devices with a bias such that the absolute value of threshold voltage is increase are termed a slow-but-not-leaky type.
Power dissipated in CMOS integrated circuits is often described as having a static component and a dynamic component. Static power includes power dissipated through junction and device leakage, power dissipated through resistive and current-source loads, and other power consumption that is not a function of switching activity.
Dynamic power includes power dissipated through charging and discharging capacitances, including gate capacitances, as well as crossover current dissipated during signal transitions at gate inputs. Crossover current includes current that passes from rail to rail through both the N-type and P-type stacks of a CMOS gate because both stacks are partially conductive during a transition of an input signal to the gate. Dynamic power is generally a function of parameters including the clock rate, the capacitance switched by devices, and the supply voltage.
Historically, the component of dynamic power associated with charging and discharging capacitances has been more significant than that associated with crossover current. This was because transistors in CMOS circuits historically transition from the off-state to the on-state and vice versa rather than transitioning between a partially-conductive state and the fully on-state. The component of dynamic power associated with crossover current has generally been ignored in the design of integrated circuits.
The component of dynamic power associated with charging and discharging capacitances is proportional to the product of capacitance times the charge and discharge rate times the square of the voltage. The activity ratio of each node is the ratio of the charge and discharge rate of the node to the clock rate. Dynamic power is therefore generally proportional to the product of clock rate times the activity ratio times node capacitance times the square of the power supply voltage.
The activity ratios of nodes of a processor or other large logic circuit vary with the design of the circuit, the position in the circuit of the nodes, and with the functional environment of the circuit. The activity ratios of different nodes in a circuit may vary substantially. The functional environment of the circuit includes, for processor circuits, code running on the processor.
The total power dissipated by a device includes both static power and dynamic power. Leakage in Reduced-Vt devices used in logic gates contributes to static power.
Much design of complex integrated circuits is accomplished through a design flow that begins with creation of a synthesizable register-transfer-level (RTL) description of the circuit. Synthesis tools, available from Cadence Design, Mentor Graphics, and Synopsys, among other vendors, map this RTL description into a gate-level netlist. Selected circuitry may also be synthesized manually through creation of gate-level schematics and extraction of the schematics to create a gate-level netlist. Static timing analysis software is then used to determine expected delays in a circuit, and to compare these delays with limits expressed in a xe2x80x9cconstraint file.xe2x80x9d Static timing analysis software is incorporated into many common synthesis tools and is also available as stand alone software from vendors including Synopsys, Mentor Graphics, and Avertec. Results of this preliminary timing analysis are often fed back to the synthesis tool, which substitutes faster gates, and may rearrange logic, as necessary to meet timing requirements.
Synthesized logic meeting pre-layout timing constraints is then laid out, or physically designed, often by place and route software such as that available from Cadence Design, Avant!, and Monterrey Systems. Layout-dependent capacitive loading and interconnect resistance information is then extracted from the physical design, and additional static timing analysis performed to verify that the circuit still meets timing requirements.
The universe of possible circuits for each path in an integrated circuit can be quite large. Each possible circuit has an associated power-delay product. It is known that there may be several local minima in the universe of power-delay products for each path. Some of these local minima may have lower power-delay products than others; it is desirable to find and implement the solution having the lowest power-delay product in the universe, this solution is the global minimum.
Existing timing-driven integrated circuit design software typically considers timing and power consumption separately. This may result in designs that dissipate considerably more power than that which would be required if the circuit were optimized for both power consumption and for timing because a local minima is found rather than the global minimum.
Conventional optimizers start with an initial condition and determine a search direction by examining a derivative of the power-delay product of the universe of solutions. It then xe2x80x9cslidesxe2x80x9d down the power-delay product function in units of a predetermined step size to reach a minimum point. This minimum is likely to be a local minimum for many, but not all, initial conditions. The search performed by the optimizer is termed xe2x80x9cgreedyxe2x80x9d if it only allows its search to proceed in a direction that appears from the derivative to lead to a more optimal member of the universe of solutions. A xe2x80x9cgreedyxe2x80x9d search will stop when the optimizer has xe2x80x9cslidxe2x80x9d down to a local minimum. The optimizer may then report that local minimum as the best solution found, often without climbing a xe2x80x9chillxe2x80x9d from which a better minimum can be found; a condition termed xe2x80x9cstuck in a local minimumxe2x80x9d.
Greedy optimization is typically fast but tends to stick in a local minimum. Optimizers of this type can be termed local optimizers, since they find the nearest local minimum. Greedy optimizers therefore require good initial solutions xe2x80x9cnearxe2x80x9d the optimal solution; initial solutions that can be difficult to provide.
Global optimizers have an ability to find global minima, as opposed to local minima. Some optimizers used in computer-aided design (CAD) for integrated circuits, including placers and routers, use a process called xe2x80x9csimulated annealing.xe2x80x9d In simulated annealing, an initial state is randomly mutated into a successor state. The successor state is evaluated, and the evaluation result is compared against the initial state. In general, successor states replace the initial state for following iterations if they are determined to be an improvement on the initial state.
Some xe2x80x9chill-climbingxe2x80x9d simulated annealing optimizers allow for occasional retention of state evaluated as inferior to the initial (or parent) state. Typically, simulated annealing has a xe2x80x9ctemperaturexe2x80x9d parameter that controls the amount of change made between the initial state and each successor state. With a hill-climbing optimizer, this temperature also controls the likelihood that an inferior state will be retained as the parent state for further iterations. This temperature is gradually reduced as optimization proceeds.
Genetic optimizers are global optimizers employing algorithms that are modeled on the process of evolution in nature. Typically, genetic algorithms operate by creating a population of individual variations, or mutations, from at least one parent individual. Each individual is a proposed solution to a particular problem. Each individual is typically represented as a machine representation having a particular state.
Individuals of the population may be created by mutating a parent, or by crossing portions from several parents. These algorithms then undergo a selection process, where individuals of the population are scored and those individuals determined to be better than most of the population are retained, while the remainder are deleted. The retained individuals may be used as parents in further iterations.
After one or more generations of the population, a particular xe2x80x9cbestxe2x80x9d individual is selected as an optimized solution to the problem being solved.
Simulated annealing optimizers typically create a single mutated state at each iteration, the mutated state being derived by modifying a single parent state. Genetic optimizers typically create a population having more than one individual mutated state at each iteration. Genetic optimizers also often create individual mutated states of the population through crossover operations from more than one parent state.
It is desirable to reduce the overall power dissipated in an integrated circuit. It is also necessary to ensure that circuit timing requirements are met. The invention as described below helps the design engineer to achieve these goals.
It has been found that predicted total power dissipation and circuit speed of an integrated circuit can be optimized by automatically resizing devices and selectively substituting fast-but-leaky devices for normal devices.
The activity ratio of each gate is determined by logging activity of each node during logic simulations of the design. For particular embodiments involving processor integrated circuits, logic simulations are performed while simulating execution of benchmark programs similar to those expected to be run by typical users.
Candidate devices for optimization are identified by inspection of a netlist of the integrated circuit. These candidate devices may, but need not, be on critical paths of the circuit. In particular embodiments, it has been found beneficial to include in optimization both devices on and off the critical paths of the circuit.
The netlist may be a pre-layout netlist with expected interconnect resistance and capacitance, or a post-preliminary-layout netlist with extracted interconnect resistance and capacitances. Optimization may be performed on both.
Global optimization is performed. This involves candidate devices being substituted with fast-but-leaky devices and/or gates of altered size. Substitutions are evaluated for total power and speed. Total power includes static power dissipated by fast-but-leaky devices as well as dynamic power.
In a particular embodiment, a genetic optimization method is used to optimize sections of an integrated circuit design for power consumption and for circuit speed in the same genetic optimization. In this embodiment, both size and gate type substitutions may be made at one or more points in the circuit to create each individual of the population; individuals are scored for both power dissipation and speed at each iteration. The genetic optimization adjusts both device types and sizes at multiple locations in a circuit to produce well-optimized final circuit designs.
After the global optimization, an additional stage of greedy optimization is performed on selected optimized circuit partition variants as produced by the genetic optimization. The best performing optimized circuit partition variant is selected for use in the optimized integrated circuit.