The present invention generally relates to Integrated Circuit (IC) chips and more particularly to monitoring and reducing IC chip electro-migration failures.
To minimize semiconductor circuit power consumption, most Integrated Circuit (IC) chips are made in the well-known complementary insulated gate field effect transistor (FET) technology known as CMOS. A typical CMOS circuit includes paired complementary devices, i.e., an n-type FET (NFET) paired with a corresponding p-type FET (PFET), frequently gated by the same signal. The current each FET or device can provide is directly proportional to the channel width (W) to length (L) ratio (W/L) of the particular device. Thus, a high power devices tends to have a very high width to length ratio. Even with a minimum channel width, a high powered single device, e.g., with W/L at or above 100, might tend to have a somewhat meandering wide, short aspect ratio. However, ten (10) parallel connected devices with W/L=10 have the same current capacity as the single wide device with a potentially much more compact footprint that approaches a square.
Thus, designers normally design these high powered devices as parallel or ganged devices with fingered gates, sources and drains all tied or connected in parallel. Similarly, in a typical standard cell logic chip, where the logic is implemented from a library of lower order logic circuits, books or cells, power from an underpowered book output may be increased by adding one or more parallel drivers to power up and drive that book output. Also, some modern IC chips have FET channels formed on surface ridges or fins for improved density. FETs formed on a single fin, called FinFETs, are width limited by the fin size with correspondingly limited current. Thus, designers also parallel multiple connected FinFETs to increase FinFET current capacity. However, increasing current capacity may increase local circuit sensitivities, for example, to heat damage.
Typical semiconductor fabrication materials, especially dielectrics, have poor thermal conductivity. High transient current from these large multi-fingered drivers switching rapidly, e.g., at a several gigahertz (GHz), can very rapidly cause very localized heating. Further, these drivers may alternate between periods of relative dormancy and periods of near maximum switching. Thus, localized heating from a small, isolated circuit can build rapidly into a random hot spot separated by periods of abatement. These hot spots may be the source of unsuspected failures, including redundancy faults, e.g., the loss of one of the ganged devices/drivers.
For example, current density and temperature in one of these hot spots can cause electromigration (EM) that open circuit connecting wires or lines, e.g., supply (Vdd or ground) lines or signal outputs. Designers identify electromigration concerns during chip design with EM modeling. Once identified, designers can address those concerns with more robust metal wiring and by limiting maximum chip operating frequency (Fmax). However, limiting Fmax, which is a measure of chip performance, to limit chip power consumption and localized heating, also limits chip performance. Consequently, these electromigration workarounds have limited/mitigated device scaling benefits.
Moreover, semiconductor manufacturing carries substantial process variations and physical limitations. These manufacturing variations cause physical and electrical parameters to vary, perhaps by as much a two times (2×), or more, best case to worst case. These device parameters cause variations across the entire chip production population in threshold voltages (VT), drain saturation currents (IdSats), contact resistance, and material sheet resistance (ρs) and so forth. Statistically, in production some chips these variations are large enough for some parameters (e.g., for an “outlier” chip), to effect reliability risking field failures. These large multi-fingered drivers are especially sensitive to these variations. Predicting these failures has required extended stressing and screening, e.g., using kerf test sites with testable models of the most susceptible structures. Even so, it was very difficult to get empirical feedback from field fails to verify the accuracy of the EM modeling and chip use conditions.
Chip designers have included on-chip low frequency sensors to measure average temperature in situ periodically, e.g., using diode forward bias voltages or metal resistance. However, for randomly occurring hot spot flare ups checking the temperature sensors is too infrequent, e.g., on the order of one hertz (˜1 Hz), and very likely, with the sensors located peripherally and not at hot spot locations. So, recorded sensor data seldom reflects affects to most sensitive on-chip structures. Thus, collected data is seldom relatable to reliability issues that, especially in state of the art high performance ICs, are most likely to manifest in circuits with large multi-fingered or multi-fin devices. So, it is also very unlikely that any of the sensors capture local hot spot effects that cause failures, much less predict impending failures. Consequently, because chip users have no forewarning of impending failures, other than scheduled maintenance, users have little recourse than to wait for failures, react as they occur, and accept the consequences, e.g., expensive down time and emergency service calls.
Thus, there exists a need for accurately characterizing local effects on Integrated Circuit (IC) chip circuits that impact chip lifetime and internal element reliability; and more particularly, a need for monitoring sensitive multi-fingered IC chip devices for impending failures that occur during short duration events, making it impossible to identify or predict the occurrence of chip over-stressing prior to failure.