Complementary metal oxide semiconductor (CMOS) integrated circuits (ICs) typically include one or more clock networks for providing one or more clock signals to various circuit elements of the IC. The clock networks include one or more clock sources coupled to one or more clock “sinks”—circuit elements that require a clock signal. Typical clock sinks might include flip-flops, latches, registers, gates and other circuit elements. In general, clock signals are regularly timed periodic signals, which might be utilized for timing purposes, for example, to synchronize, switch or trigger one or more circuit elements of the IC. A typical clock signal might be generated by a crystal-based clock, a phase-locked loop (PLL) clock, a ring oscillator or other similar circuits either internal to or external to the IC.
The timing of clock and data signals in ICs is typically precisely controlled, and clock signals routed within the IC are desirably synchronized such that each clock sink receives the same clock signal at approximately the same time. A common problem in IC design is “clock skew”. Clock skew occurs if clock signals arrive at the various clock sinks at different times, impairing synchronized operation of circuit elements of the IC. Thus, the delay faced by a given data or clock signal over its respective signal path, from the signal source to the signal sink, is accounted for in the design and implementation of an IC. For example, the path length, resistance, parasitic capacitance, parasitic inductance, the number and type of attached clock sinks, and other characteristics of a given signal path might affect the delay between a given signal source and a given signal sink.
Therefore, an IC designer attempts to ensure that the various clock signal paths of a given clock network have substantially the same signal delay. Fine tuning of circuit path timing in the IC design (termed “timing closure”) involves completing complex circuit placement and routing routines along with tuning data path and clock signal delays. To reduce circuit design area and power, as well as manage clock skew, delay cells might be placed within clock signal paths, as opposed to modifying the numerous data and clock signal paths of the IC, which is a time consuming and expensive part of the design process. Introduction of delay cells allows for optimization of clock networks by tuning with inserted delays to correct timing issues, with fewer cell changes to the IC design.
FIG. 1 shows a circuit diagram of a typical prior art delay cell 100. As shown in FIG. 1, a typical delay cell comprises a string of cascaded CMOS inverters, shown as 102(1)-102(N), where N is typically a positive even integer. Each CMOS inverter is typically implemented in a similar manner. For example, CMOS inverter 102(1) comprises PMOS transistor 104(1) and NMOS transistor 106(1) coupled in a cascade configuration, where the gate nodes of both transistors 104(1) and 106(1) are coupled to an input signal, shown as Vin. The source node of PMOS transistor 104(1) is coupled to a first power supply signal, shown as Vdd, and the source node of NMOS transistor 106(1) is coupled to a second power supply signal, shown as Vss, where Vdd is at a greater voltage potential than Vss (i.e. Vss is less than Vdd). The drain node of PMOS transistor 104(1) is coupled to the drain node of NMOS transistor 106(1), providing an output signal, Vout(1). Output signal Vout(1) might be provided to a next CMOS inverter 102 (e.g., CMOS inverter 102(2), not explicitly shown in FIG. 1), or might be provided as the output of delay cell 100 (e.g., Vout(N)).
In general, each CMOS inverter 102(1)-102(N) might be implemented such that each of transistors 104 and 106 has a non-minimum channel length so as to create relatively slower inverters, thus creating delay elements, where the channel length of a transistor is the distance between the source node and the drain node. A shorter channel length corresponds to faster switching by the transistor. To increase or decrease the delay, additional inverters might be added or subtracted from delay cell 100 (e.g., N might be increased or decreased), or the channel length of each of transistors 104 and 106 might be increased or decreased to achieve a target delay time for delay cell 100. Inverter 102(1)-102(N) might typically employ transistors having a long channel length (e.g., 5 times the minimum channel length of the CMOS technology). Delay cell 100 might typically be implemented having values of N (e.g., the number of inverter stages) from 2 to 10 or more. Further, delay cell 100 might be modified to have different output inverter drive strengths to accommodate signal loading variations in different applications. Increasing or decreasing the number of inverters 102 in delay cell 100, changing the channel length of the transistors 104 and 106, and changing the drive strength for output inverter 102(N) all impact the overall physical size of delay cell 100 on the silicon of an IC. Thus, each time delay value might be implemented with a corresponding delay cell of a unique physical size. Circuit element sizes are commonly measured as grids in standard cell library terms, where a grid is typically the unit size of the overlying routing grid of the IC.
Delay cells typically found in standard cell libraries each have a unique cell size depending on the delay value, since the delay value is based on the number of delay elements, the size of the delay elements, and the drive strength of the output inverter of the delay cell. The drive strength might need to be increased or decreased, for example, based on a number of clock sinks coupled to the output of the delay cell. During timing closure, if a timing change is needed for more or less delay, regardless of the timing delta, the IC designer is required to select a different delay cell from the standard cell library. The switch to a different delay cell might create significant disruption to the current place and route results depending on the size difference between delay cells. Subsequently generated place and route results might then produce signal parasitic differences, introducing further difficulties in the timing closure process.
FIGS. 2a and 2b show an exemplary IC design layout of the prior art. As shown in FIG. 2a, an IC design might comprise one or more cell rows, shown as cell rows 202 and 206, where the cell rows are interconnected by cell interconnect grid 204. Each cell row might allow an IC designer to place one or more circuit elements from the cell library, shown generally as cells 210. Routing of signals between cells can be set by cell interconnect grid 204. As shown in FIG. 2a, the IC designer has first placed a delay cell 1, which occupies cell area 208, and which has a given delay value. If, in the course of timing closure, the IC designer determines that a different delay value is required to meet timing requirements of the IC, the designer must select a different delay cell from the cell library that has the desired delay value. As shown in FIG. 2b, delay cell 2, having a larger delay value than delay cell 1, is placed into the IC design. Delay cell 2 occupies cell area 208 taken up by delay cell 1, plus additional cell area 222. By occupying a larger cell area, employing delay cell 2 rather than delay cell 1 might also require changes to the locations of one or more surrounding cells 210 and also to signal routing in cell interconnect grid 204, indicated as shaded area 224.
Further, as CMOS technology continues to reduce geometry to provide smaller and faster devices, relatively large delays (e.g., hundreds of ps or 1 ns) become difficult to obtain without a very large delay cell area. Typical delay cell elements for large delays employ a series of CMOS inverters where the internal inverters use very long channel length transistors—often three to ten times the minimum channel length for the given CMOS technology. These very long channel transistors create difficulties in trying to maintain balanced rise/fall delay skew across the operating range of manufacturing process, voltage and temperature (PVT) worst-case slow (WCS) and worst-case fast (WCF) variations of the IC. Additionally, in typical delay cells having balanced rise/fall delay skew, managing rise and fall signal skew for both data and clock signal paths creates difficulty in achieving timing closure. For example, if having unbalanced rise/fall delays is not possible, and circuit redesign might be necessary.
Another problem with long-channel transistors occurs during manufacturing testing of initial integrated circuit silicon. To emulate the WCS to WCF variation in silicon using one manufacturing lot, Polysilicon Gate (poly-gate) Critical Dimension (CD) variation is often performed. Poly-gates that are slightly widened or narrowed alter a transistor's switching performance and, therefore, circuit path delays. For example, a +/−5% poly-gate CD variation used for 40 nm CMOS technology corresponds to a poly-gate variation of +/−2 nm. However, a delay cell employing long-channel transistors (e.g., 120 nm), when varied by the same amount as other standard cell gates in the design (e.g., 2 nm), exhibits relatively negligible delay variation. With circuit timing closure achieved using accurately modeled WCS and WCF timing simulation libraries, silicon produced with poly-gate CD variation might have inaccurate skew of delay cell paths versus normal standard cell paths, resulting in timing problems and, in the worst-case, circuit failure.