In offering services targeting customers, in recent years, there exist a lot of services using a computer system and information and communication technology such as mail-order selling using the internet. In order to carry out such services smoothly, it is requested that the computer system always operates stably. Therefore, operations management of the computer system is indispensable.
However, operations management of such system has been performed manually by a system administrator. Therefore, there is a problem that, along with increase in scale and complexity of the system, sophisticated knowledge and experience are required for the system administrator, and the system administrator or the like who does not have such knowledge and experience sufficiently may cause wrong operations.
In order to avoid such problem, a system operations management apparatus which performs unified monitoring of a status of the hardware that composes a system and controlling thereof has been provided. This system operations management apparatus acquires data representing an operating status of the hardware of the managed system (hereinafter, referred to as performance information) online, and determines presence of a failure on the managed system from a result of analysis of the performance information and shows its content to a display unit (for example, a monitor) which is an element included in the system operations management apparatus. Here, as an example of a method to determine presence of the failure mentioned above, there are a technique to perform determination based on a threshold value for the performance information in advance and a technique to perform determination based on a reference range for a difference between an actual measurement value of the performance information and a calculated value (theoretical value) of the performance information calculated in advance.
In this system operations management apparatus, information about presence or absence of the failure on the system is shown on the display unit such as the monitor as mentioned above. Therefore, when presence of the failure is shown, the cause of the failure needs to be narrowed down from the shown content to whether the cause of the failure is lack of the memory capacity or whether it is overload of a CPU (Central Processing Unit) in order to improve the failure. However, because such narrowing-down work of the cause of the failure requires an investigation of system histories and parameters of portions which seem to be related to occurrence of the failure, the work needs to depend on the experience and sense of the system administrator who performs the work. Therefore, a high skill will be required inevitably for the system administrator who operates the system operations management apparatus. At the same time, solving the system failure through operation of the system operations management system forces the system administrator to bear heavy time and physical burden.
Accordingly, in this system operations management apparatus, it is important to perform analysis of a combination of abnormal statuses or the like automatically based on information of processing capacities collected from the managed system, inform the system administrator of a summarized point of a problem and a cause of the failure which are estimated roughly, and then receive an instruction for handling thereof.
Thus, there are various related technologies regarding the system operations management apparatus equipped with functions to reduce the burden of the system administrator who performs management of the system and repair work of the failure. Hereinafter, those related technologies will be described.
The technology disclosed in Japanese Patent Application Laid-Open No. 2004-062741 is a technology related to a failure information display apparatus which indicates failure information of a system. This technology makes it possible to recognize the location of a failure visually and easily, simplifies estimation of the origin of the failure, and thus reduces a burden of a system administrator, by showing a failure message according to the order of occurrence of the failure and actual arrangement of a faulty unit to outside, when any failure is found in management of an operating status of the managed data processing system.
The technology disclosed in Japanese Patent Application Laid-Open No. 2005-257416 is a technology related to an apparatus which diagnoses a measured device based on time series information on parameters acquired from the measured device. The technology detects a failure caused by performance deterioration of the measured device appropriately by calculating strength of a correlation between information of parameters based on variations of time series information of the parameters. According to this technology, it can be judged appropriately whether time series variations of information on different parameters are similar or not.
The technology disclosed in Japanese Patent Application Laid-Open No. 2006-024017 is a technology related to a system for estimating the capacity of a computer resource. The technology identifies an amount of a load caused by specific processing and analyzes the load associated with an amount of processing in the future by comparing a history of processing of system elements and a history of changes in performance information. According to this technology, when relation between the processing and the load has been grasped in advance, the behavior of a system can be identified.
Technology disclosed in Japanese Patent Application Laid-Open No. 2006-146668 is a technology related to an operations management support apparatus. This technology acquires information on an operating status of hardware such as a CPU and information on access volume to a web control server from a managed system in a regular time interval, finds a correlation between a plurality of elements which compose the information, and determines whether the current status of the system is normal or not from the correlation. According to this technology, a situation of performance degradation of the system can be detected more flexibly while the cause of the degradation and measures thereto can be shown in detail.
The technology disclosed in Japanese Patent Application Laid-Open No. 2007-293393 is a technology related to a fault monitoring system which searches similar failures in the past. By acquiring information related to various kinds of processing capacity periodically and indicating the information on a time axis along with information related to a failure which occurred in the past, the technology can predict occurrence of a failure in the future based on whether it is similar to analysis information at the time of occurrence of the failure in the past.
The technology disclosed in Japanese Patent Application Laid-Open No. H10-074188 is a technology related to a data learning device. The technology compares information of a learning object acquired from a data managed apparatus and information related to an estimated value generated in advance, and determines that the acquired information is exceptional information when the similarity degree between them is smaller than or equal to a predetermined criterion. Further, the technology corrects the content of the information related to the estimated value based on a difference between them. According to this technology, processing accuracy of data managed apparatus can be improved by repeating such operation.