1. Field of the Invention
The present invention relates generally to techniques for detecting faults in computer systems. More specifically, the present invention relates to a method and apparatus for detecting a change-point in a time-series of telemetry signals from a computer system.
2. Related Art
Continuous system telemetry is increasingly being used to monitor the health of computer servers. Some telemetry signals have statistical properties that are dynamic with time (i.e., non-stationary processes). These telemetry signals can vary with changing system parameters such as CPU loads, memory demand, and I/O bandwidth. Other telemetry signals have statistical properties that are stationary with time (i.e., stationary noisy processes). One example of a stationary noisy process is voltage signals from DC-DC power supplies within a server.
When monitoring stationary noisy processes, one objective is to detect “change-points” in the noisy process. For example, in a noisy process with “flat” statistical properties, a change-point occurs when the level of the flat process changes. In a noisy process with a constant slope (either positive or negative), a change-point occurs when the magnitude of the slope changes.
A common technique for detecting a change-point is to set high and low threshold limits for the signal. For example, when monitoring voltage signals that are stationary noisy processes, typical values of the threshold limit are set to +/−5% of the nominal mean for the voltage signal. The problem with using threshold limits is that if the thresholds are set too closely, false alarms from spurious data values can occur. To avoid the false alarms, the thresholds are set further apart. Unfortunately, setting the thresholds further apart makes it more difficult to detect the onset of degradation in the system components (such as power supplies, voltage regulators, sensors, etc.) because the degradation will be more severe before an alarm is triggered.
Hence, what is needed is a method and an apparatus for detecting a change-point in a time-series without the problems described above.