1. Technical Field
The present invention relates to the field of data processing systems, and more particularly to a method, computer program product, and system for predictive system monitoring.
2. Background of Invention
Applications for monitoring data processing systems play a key role in their management. For example, those applications are used to detect any critical condition in the system (so that appropriate corrective actions can be taken in an attempt to remedy the situation). For this purpose, selected performance parameters of the system (such as processing power consumption, memory space usage, bandwidth occupation, and the like) are measured periodically. The information so obtained is then interpreted (for example, according to a decision tree) so as to identify any critical condition of the system. For example, the occurrence of a low response time of the system can be inferred when both the processing power consumption and the memory space usage exceeds corresponding thresholds values.
Traditional monitoring applications are normally configured with predefined corrective actions, which are launched in response to the detection of corresponding critical conditions. These applications are event based, i.e. they react to events, e.g. a metric threshold's being exceeded within intervals being decided by users.
A drawback of the solutions described above is that they can only be used to recover the correct operation of the system. Indeed, the corrective actions are executed when any problem has become severe and the system cannot continue working properly. Therefore, those solutions are completely ineffective in preventing the occurrence of the problems in the system.
With this sort of traditional approach the notification is issued only when a problem occurs, while it would be desirable to anticipate the problems by predicting what is going to happen.
For this reason predictive monitoring applications have been developed which are structured in order to be able to anticipate problem occurrence under certain conditions. The usual way to realize a predictive approach is to tune and define multiple thresholds in order to generate multiple conditions for the same area of interest. This produces notifications with increasing severities resulting in alerts which occur before a critical event takes place. Examples of prior art predictive monitoring system can be found e.g. in IBM® Tivoli® Performance Analyzer of International Business Machines Corp, a software product that is able to generate predictive alerts based on linear analytic computations.
A drawback of existing predictive monitoring systems is that they do not normally take into account how fast a possible critical situation is approaching when asserting severity of the predicted problem. However this information (the speed) can be crucial. information when ranking a situation to dispatch resolution resources. In fact a situation approaching its critical status very fast is more serious and should be addressed before another situation that maybe is approaching the critical status relatively slowly, even if the latter is in a worse current status. It would be desirable to have a monitoring and events management system which determines the severity of a possible problem also considering the speed of approach of the problem. To achieve this we would need to isolate trends which may be hidden by transient effects. Given a system where a typical monitoring solution is implemented (metrics sampling), it is possible to use the last n samples for predictive analysis, by representing them as a discrete signal. The usual techniques for signal analysis use Fourier analysis which breaks down a signal into constituent sinusoids of different frequencies. Another way for describing Fourier analysis is as a mathematical technique for transforming our view of the signal from time-based to frequency-based representation. In a real system, several metrics are not flat, but they could be affected by noise in terms of large and quick variations even if the system is globally stable. Indeed the variations might not highlight any problems, but could depend on the normal system activity. In a similar scenario Fourier analysis has a serious drawback: the most interesting signals contain several non-stationary or transitory characteristics: drift, trends, abrupt changes, beginnings and ends of events that are not highlighted by Fourier analysis. Furthermore in transforming from time to frequency domain, time information is lost. When looking at a Fourier transform of a signal, it is impossible to tell when a particular event took place. In those circumstances where signal properties do not change very much over time—i.e. if it is a so-called stationary signal—this drawback is not too heavy, but when, as in the present case, where we are mainly focused on e.g. time information to discover hidden potentially dangerous trends, this approach is not the best option.
It is an object of the present invention to provide a technique which alleviates the above drawback of the prior art.