1. Field of the Invention
The present invention relates to thermal diagnostics techniques applied to computer systems and other electronic systems. In particular, the present invention relates to the improved detection of airflow anomalies.
2. Description of the Related Art
Large computer systems are often consolidated into centralized data centers. Rack systems, in particular, conserve space and put the servers and infrastructure within easy reach of an administrator. “Blade” servers are among the more compact server arrangements. A blade server, such as the IBM eServer BLADECENTER (IBM and BLADECENTER are registered trademarks of International Business Machines Corporation, Armonk, N.Y.), is a type of rack-optimized server that eliminates many of the complications of previous generation rack servers. Due to the compact nature of rack systems, individual servers share a thermal environment with other hardware, such as enclosures, power supplies, fans, and management hardware. Managing power consumption and maintaining proper cooling is therefore critical. Because of the large number of elements typically housed within rack systems, the airflow and heating patterns are fairly complicated. Many potential causes of thermal problems exist, which can lead to component failure and increase the complication and expense of system maintenance.
Due to the complexity and sophistication of today's computer systems, computerized thermal diagnostic techniques have been developed to analyze the airflow and heating patterns in computer systems, to detect thermal faults and avert component failures. Flow Network Modeling is among the preferred thermal diagnostic techniques. U.S. Pat. No. 6,889,908, for example, describes a technique for diagnosing airflow anomalies in electronic equipment by introducing fault scenarios into a Flow Network Model of the equipment, and determining which simulated fault predicts a set of expected temperatures that match observed temperatures.
Airflow anomalies are one type of fault scenario that may be detectable by thermal diagnostics. An airflow anomaly is any airflow condition that may adversely affect cooling in a computer system. Airflow anomalies are usually unexpected or unintended airflow changes resulting from improper operation or maintenance, such as through accident, abuse, or neglect. Airflow anomalies may prevent proper cooling of a component, causing the component to heat up and possibly exceed safe operating temperatures, particularly when the computer system is subsequently operated at higher temperatures. At lower temperatures, however, little or no airflow may be required to cool components, so the temperature effects of an airflow anomaly may be minimal. Due to limitations such as the resolution of temperature sensors used in the diagnostic system and the computational uncertainty involved with thermally modeling a computer system, the minimal temperature effects of an airflow anomaly at low temperature may therefore be undetectable. As a result, airflow anomalies may lie dormant, undetectable by conventional thermal diagnostic techniques. Any undetected airflow anomaly may cause heating problems when the equipment is subsequently at higher temperatures, at which point it may be too late to take corrective action.
Improved thermal diagnostic techniques are needed in view of the limitations of existing techniques. More reliable detection of airflow anomalies is desired. An improved thermal diagnostic technique would preferably allow for the detection of airflow anomalies even when a computer system has been idling or otherwise operating at lower temperatures and loads.