1. Field of the Invention
The present invention relates to techniques for proactively detecting impending problems in computer systems. More specifically, the present invention relates to a method and an apparatus for optimizing a regression model using a genetic technique, wherein the regression model is used to detect anomalies in a signal in a computer system.
2. Related Art
Modern server computer systems are typically equipped with a significant number of sensors which monitor signals during the operation of the computer systems. Results from this monitoring process can be used to generate time series data for these signals which can subsequently be analyzed to determine how a computer system is operating. One particularly desirable application of this time series data is for purposes of “proactive fault monitoring” to identify leading indicators of component or system failures before the failures actually occur.
In particular, advanced pattern recognition approaches based on nonlinear kernel regression are frequently used in the proactive fault monitoring, whereby the complex interactions among multivariate signal behaviors are modeled. Using these approaches, a kernel regression model is first constructed during a training phase, wherein correlations among the multiple input signals are learned. In a subsequent monitoring phase, the kernel regression model is used to estimate the values of each input signal as a function of the other input signals. Significant deviations between the estimated values and measured values of the same signal are used to detect potential anomalies in the system under surveillance.
Note that it is desirable to select an appropriate subset of signals from all the available input signals to be included in the kernel regression model. There are a number of criteria by which the performance of a model can be evaluated, which can include: (1) accuracy: ability of the model to correctly estimate the value of a signal in the absence of faults in the system; (2) robustness: ability of the model to maintain accuracy in the presence of signal disturbance (i.e., estimates should not track errors in a faulty signal); and (3) spillover: ability of the model to isolate a faulty signal (i.e., estimates of signal A should not be affected by a fault in signal B). Moreover, it is particularly desirable from a computational standpoint to minimize the number of input signals included in the model without compromising the performances of the model. This is because the computational cost for the kernel regression computations generally scales with the square of the number of input signals in the model.
Unfortunately, conventional approaches for choosing an appropriate subset of signals for a kernel regression model have been primarily based on trial-and-error techniques in combination with rudimentary linear correlation analysis, which are not sufficient to predict the nonlinear correlation behaviors among the input signals. More significantly, there are often a large number of available signals in a computer system (e.g., >1000 signals in a high-end server system). Computational cost makes it intractable to examine all possible combinations of these signals to determine the optimal subset to be included in a model using the conventional approaches.
What is needed is a computationally efficient technique for optimizing a kernel regression model without the above-described problems.