IT service providers need to carefully monitor performance and trends of the IT environment, to maintain its utilization under control. These organizations are facing technical challenges to perform properly the two key IT management processes: problem management and service measurement.
In some situations there are problems on a specific application with clear root causes that need an analysis on a limited number of IT infrastructure components (for example servers or subsystems). In other situations, recurring service outages occur with no evident root cause. This causes service unavailability or performance degradation. Nowadays, larger and larger IT resources are built and exploited, as in cloud computing infrastructures. Massive IT infrastructure components analysis or preventive health status monitoring need a different approach. IT service providers need to perform complex analyses and try to answer to the following questions:
are all the servers/middlewares performing well?
is there any servers/middlewares that is not performing as it should be?
is there any hidden problem that might become serious, if the IT volumes increase?
A few patents provide approaches aiming at monitoring performances of IT resources; for example:
U.S. Pat. No. 6,522,768 entitled “PREDICTING SYSTEM BEHAVIOR OF A MANAGED SYSTEM USING PATTERN RECOGNITION TECHNIQUES” discloses a system for predicting system behaviour of a managed system which includes a measurement module coupled to the managed system to generate measurement data of the managed system. The measurement data include current measurement data and past measurement data. The past measurement data indicate a problem of the managed system. A pattern classification module is coupled to the measurement module to process the past measurement data into a plurality of representative pattern images, and to select a predictor pattern image that best identifies the problem from the pattern images. A pattern matching module is coupled to the pattern classification module and the measurement module to process the current measurement data into a plurality of pattern images using the same image processing technique that generates the predictor pattern image. The pattern matching module also identifies any pattern image that matches the predictor pattern image to predict the problem. A system for generating a predictor pattern image for predicting system behaviour of a managed system is also described.
This approach is a predictive system, while the need remains to discover existing performance problems. The proposed algorithm of this approach is based on images analysis (pixel) “Gradient analysis, texture representation, sharpening edge detection” which presents many drawbacks. What is more, a time sliding windows is required, to analyse evolution over time. The approach uses image compression techniques which are not convenient (computational power is required, artificial bias, etc).
United States Patent Application Publication Number 20060020924 entitled “SYSTEM AND METHOD FOR MONITORING PERFORMANCE OF GROUPINGS OF NETWORK INFRASTRUCTURE AND APPLICATIONS USING STATISTICAL ANALYSIS” discloses a system, method and computer program product for monitoring performance of groupings of network infrastructure and applications using statistical analysis. Said method, system and computer program monitor managed unit groupings of executing software applications and execution infrastructure to detect deviations in performance. Logic acquires time-series data from at least one managed unit grouping of executing software applications and execution infrastructure. Other logic derives a statistical description of expected behaviour from an initial set of acquired data. Logic derives a statistical description of operating behaviour from acquired data corresponding to a defined moving window of time slots. A logic compares the statistical description of expected behaviour with the statistical description of operating behaviour; and a logic reports predictive triggers, said logic to report being responsive to said logic to compare and said logic to report identifying instances where the statistical description of operating behaviour deviates from statistical description of operating behaviour to indicates a statistically significant probability that an operating anomaly exists within the at least managed unit grouping corresponding to the acquired time-series data.
This approach is a real time monitoring system to detect deviations from an historical trend, while the need remains for a discovery of existing performance problem using collected data.
The considered approach uses “an expected normative behavioural pattern”, called footprint, and calculates the deviation between monitored data and the footprint of the system monitored. This comparison with well-performing systems is not satisfactory. As the first approach, the proposed method uses a time sliding windows (evolution over time) with sample data collected at regular time intervals. The considered algorithm measures the “angle of the subspaces” to calculate the differences between the data stream and the footprint; this approach also presents drawbacks.
There is a need for a method enabling to discover existing performance problems of IT resources.