1. Field of the Invention
The present invention relates to pre-silicon and/or post silicon projection of reliability metrics pertaining to microprocessor chips and systems. More particularly, the present invention provides a reliability measure rating a design or for the performance of a chip.
2. Description of the Related Art
Advances in semiconductor (specifically, CMOS) technology have been improving microprocessor performance steadily over the past twenty years. However, such advances are of late accelerating the onset of reliability problems. Specifically, one of the consequences of progressive scaling of device and interconnect geometries is the increase in average and peak power densities (and hence temperatures) across the chip. The inherent increase in static (leakage) power with scaling into the deep submicron region, adds to this problem, and the fact that the major components of leakage power increase with temperature, makes the problem even harder to control.
Despite advances in packaging and cooling technologies, it is an established concern, that the average and peak operating temperatures within key units inside a microprocessor chip will be higher with the progressive scaling of technology. Already, to protect against thermal runaways, microprocessors (like Intel's Pentium 4™ and IBM's POWER5™) have introduced on-chip temperature monitoring devices, with mechanisms to throttle the processor execution speeds as needed. The objective is to reduce on-chip power when maximum allowable temperatures are approached or exceeded.
Failure rates of individual components making up an integrated circuit (or a larger system) are fundamentally related to operating temperatures: these rates increase with temperature. As such, chips or systems designed to operate at a given average temperature range, are expected to fail sooner than specified, if that range is routinely exceeded during normal operating conditions. Conversely, a chip or system is designed to meet a certain mean time to failure (MTTF), at an assumed maximum operating temperature. In this case, the designed chip or system will be expected to have a longer lifetime, if the actual operating temperatures happen to be lower.
Electromigration and stress migration effects in the chip interconnects are major sources of failures in a chip and, they both have a direct dependence on operating temperature. However, aspects of reliability degradation with CMOS scaling, are not solely due to the power and temperature implications. For example, time-dependent dielectric breakdown (TDDB) is an extremely important failure mechanism in semiconductor devices. With time, gate dielectric wears down and fails when a conductive path forms in the dielectric.
With CMOS scaling, the dielectric thickness is decreasing to the point where it is tens of angstroms only. This, coupled with the fact that there has been a general slowdown in the way the supply voltage is scaling down, is expected to increase the intrinsic failure rate due to dielectric breakdown. In addition, TDDB failure rates also have very strong temperature dependence. Thermal cycling effects, caused by periodic changes in the chip temperature is another factor that degrades reliability. This factor is not directly related to the average operating temperature; rather, it is a function of the number of thermal cycles that the chip can go through before failure.