1. Field of the Invention
The present invention generally relates to techniques for proactive fault-monitoring in computer systems. More specifically, the present invention relates to a method and an apparatus that dynamically adjusts the resolution of telemetry signals which are collected during proactive-fault-monitoring in a computer system.
2. Related Art
Modern servers are typically equipped with a significant number of sensors which monitor various signals. These monitored signals can include: physical variables, such as temperatures, voltages, and currents; and software performance metrics, such as CPU usage, I/O traffic, and memory utilization. Outputs from this monitoring process can be used to generate time series data for these signals, which are referred to as “telemetry signals.” Note that the physical variable telemetry signals are typically sampled from a continuous analog signal and are digitized using analog-to-digital (A/D) converters.
These telemetry signals can subsequently be analyzed using “electronic prognostics” techniques. Applications of electronic prognostics techniques include: “proactive-fault-monitoring,” which identifies leading indicators of component or system failures before the failures actually occur; and “reliability stress studies,” which monitor components as they are subjected to stressful conditions that accelerate failure mechanisms in the components.
Ideally, high-resolution telemetry signals can be collected for critical system variables to facilitate high-precision detection and evaluation of anomalous activity. Such high-resolution telemetry signals also enable a system to quickly determine whether a “remedial action” is necessary, and to select the proper remedial action. However, such high-resolution telemetry signals are rarely collected during an electronic prognostics process because of resource limitations, such as: limited data acquisition bandwidth, limited storage space for recording the collected data, limited instrumentation for gathering signals, and limitations on the processing power required to process the telemetry signals. Furthermore, it is generally inefficient to process high-resolution telemetry signals because telemetry signals are generally collected during operation of the system when no degradation is present and thus high-resolution signals are not needed to detect subtle anomalies. Consequently, conventional techniques typically collect low-resolution telemetry signals, which have low sampling rate and a high quantization error.
Unfortunately, using low-resolution telemetry signals for electronic prognostics purposes can significantly reduce the probability of detecting and predicting onset of subtle anomalies that precede component or system failures. This can lead to a significant delay in taking remedial action after a problem has occurred.
Hence, what is needed is a method and an apparatus that facilitates collecting high-resolution telemetry signals without the above-described problems.