The present invention relates generally to improvements in lifetime reliability of semiconductor devices and, more particularly, to a system and method for implementing dynamic lifetime reliability extension for microprocessor architectures.
Lifetime reliability has become one of the major concerns in microprocessor architectures implemented with deep submicron technologies. In particular, extreme scaling resulting in atomic-range dimensions, inter and intra-device variability, and escalating power densities have all contributed to this concern. At the device and circuit levels, many reliability models have been proposed and empirically validated by academia and industry. As such, the basic mechanisms of failures at a low level have been fairly well understood, and thus the models at that level have gained widespread acceptance. In particular, work lifetime reliability models for use with single-core architecture-level, cycle-accurate simulators have been introduced. Such models have focused on certain major failure mechanisms including, for example, electromigration (EM), negative bias temperature instability (NBTI), positive bias temperature instability (PBTI), and time dependent dielectric breakdown (TDDB).
With respect to improving lifetime reliability of semiconductor devices, existing efforts may be grouped into three general categories: sparing techniques, graceful degradation techniques, and voltage/frequency scaling techniques. In sparing techniques, spare resources are designed for one or more primary resources and deactivated at system deployment. When primary resources fail later during system lifetime, the spare resources are then activated and replace the failed resources in order to extend system lifetime. The sparing techniques cause less performance degradation due to failed resources. However, high area overhead of spare resources is a primary drawback of this approach.
In graceful degradation techniques, spare resources are not essential in order to extend system lifetime. Instead, when resource failing occurs, systems are reconfigured in such a way so as to isolate the failed resources from the systems and continue to be functional. As a result, graceful degradation techniques save overhead cost for spare resources, however system performance degrades throughout lifetime. Accordingly, graceful degradation techniques are limited to applications and business where the degradation of performance over time is acceptable, which unfortunately excludes most of the high-end computing.
Thirdly, voltage/frequency scaling techniques are often used for power and temperature reduction and are thus proposed for lifetime extension. The system lifetime is predicted based on applied workloads and the voltage/frequency of the systems is scaled with respect to lifetime prediction. While voltage/frequency scaling techniques enable aging of systems to be slowed down as needed, these techniques also result in performance degradation of the significant parts of the system or the entire systems. In addition, although reduced voltage/frequency diminishes the degree of stress conditions, these techniques are unable to actually remove stress conditions of aging mechanisms from semiconductor devices.
Still another existing technique, directed to reducing the leakage power during inactive intervals, is to use so-called “sleep” or “power down” modes for logic devices that are complemented with transistors that serve as a footer or a header to cut leakage during the quiescence intervals. During a normal operation mode, the circuits achieve high performance, resulting from the use of faster transistors which typically have higher leakage. The headers and/or footers are activated so as to couple the circuits to Vdd and/or ground (more generally logic high and low voltage supply rails). In contrast, during the sleep mode, the high threshold footer or header transistors are deactivated to cut off leakage paths, thereby reducing the leakage currents by orders of magnitude. This technique, also known as “power gating,” has been successfully used in embedded devices, such as systems on a chip (SOC). However, although power gating diminishes current flow and electric field across semiconductor devices (which results in a certain degree of stress reduction and increase in the lifetime of devices), it is unable to completely eliminate such stress conditions and/or stimulate the recovery effects of aging mechanisms.