1. Field of the Invention
Embodiments of the present invention relate to techniques for performing reliability tests on computer systems. More specifically, embodiments of the present invention relate to a method and an apparatus for dynamically controlling a temperature profile within a computer system to facilitate temperature-dependent reliability studies on the computer system.
2. Related Art
Computer system manufacturers routinely evaluate the reliability of computer systems to ensure that the computer systems meet or exceed reliability requirements of their customers. Typically, computer system reliabilities are determined through “reliability-evaluation studies.” These reliability-evaluation studies can include “accelerated-life studies,” which accelerate the failure mechanisms of the computer system, or “repair-center reliability evaluations” in which the computer system manufacturer tests computer systems returned from the field. These types of tests typically involve using environmental stress-test chambers to hold and/or cycle one or more stress variables (e.g. temperature, humidity, radiation, etc.) at levels that are believed to accelerate failure mechanisms within a computer system.
In some cases, the failure mechanisms coincide with small variations in the internal temperature of the computer system. There are several theoretical explanations for such behavior, including changes in mechanical stresses, delamination of bonded components, thermal expansion effects on interconnects and soldered joints, exacerbation of microscopic electrostatic discharge effects, and other component reliability phenomena that are affected by temperature gradients and temperature cycling.
One possible way to determine if a computer system is subject to failure from temperature variations is to place the computer system into a thermal chamber where temperature is cycled in an effort to accelerate mechanisms that can lead to failure. This type of testing requires the computer system to be shipped to a facility with a thermal chamber. At the facility, the computer system is placed in the thermal chamber and its temperature is cycled for a fixed time interval (e.g., 100 Hrs or 500 Hrs). The computer system is then removed from the test chamber for functionality testing.
Unfortunately, using a thermal chamber has several drawbacks. First, it requires the computer system to be shipped to the test facility, which involves time and expense. Second, it is usually not possible to run cables into the testing chamber to perform live monitoring of the computer system while it is in the thermal chamber. Consequently, at the end of the predetermined time interval, the computer system is removed from the thermal chambers and is evaluated “ex-situ.” Hence, thermal chamber studies yield only pass/fail information for the given interval, without identifying the exact times for the onset of degradation in the computer system. Note that it is desirable to obtain the exact times and/or temperature profiles of computer system failures to facilitate accurate long-term reliability projections (and to provide accurate information about failure mechanisms during repair-center reliability evaluations).
Hence, what is needed is a method and apparatus for performing in-situ temperature testing for enhanced reliability without the above-described problems.