For example, in the information processing system that provides information service required with high performance quality, in particular, service for unspecified number of users (hereinafter, service system), it is an essential condition for success in business to provide service with high performance quality. For this reason, in such service systems, performance is generally monitored so as to detect the degradation in performance quality early. By the performance monitoring, the degradation in performance quality is detected early, and a suitable countermeasure is taken against the degradation in performance quality. As a result, the serious accident can be prevented in advance.
The information processing system for monitoring the performance (hereinafter, referred to as performance monitoring system) periodically checks the performance information of the components constituting the service system (hereinafter, referred to as object to be monitored) and confirms whether the performance of the system is insufficient and presumed performance is produced.
A general configuration of the performance monitoring systems will be described. The performance monitoring systems are comprised of a monitoring manager program and a plurality of monitoring agent programs. The monitoring agent programs periodically monitor and analyze the states of one or more objects to be monitored, and if any trouble occurs, they notify the trouble to the monitoring manager program. The monitoring manager program controls and manages the monitoring agent programs. The monitoring manager program operates on an information processing apparatus for management provided separately from the service system. The monitoring agent programs operate on computing machines (objects to be monitored) which constitute the service system.
A flow of a monitoring process in the performance monitoring system will be described below. (1) The monitoring agent programs periodically acquire performance information from objects to be monitored at a monitoring interval 1 (1-1), analyze the performance information acquired periodically at a longer monitoring interval 2 than the monitoring interval 1 (1-2), and when the states of the objects to be monitored are determined as abnormal as a result of the analysis, they notify the trouble to the monitoring manager program (1-3).
(2) The monitoring manager program receives the notification from the monitor agent programs and analyzes an entire state of the service system (2-1), and when any countermeasure is necessary as a result of the analysis, it instructs the countermeasure by communicating with a manager and the like (2-2).
The first objective in the performance monitoring system is the reduction of the monitoring cost in the monitoring of a large-scale service system. The monitoring cost is the calculation resources such as a CPU, a memory, a network band and a disk space which are used for executing the monitoring process of the programs of a monitoring system, namely, the monitoring process of the above-mentioned performance monitoring system.
In the monitoring of a large-scale service system, the cost of (1-1), (1-2) and (2-1) in the above-mentioned monitoring process are particularly high. The processes at (1-1) and (1-2) increase in proportional to the number of information processing apparatuses constituting the service system, namely, the number of monitoring agent programs. Also, since the number of monitoring agent programs to be the trouble notification sources increases, the monitoring cost of the process at (2-1) also increases. Further, since the trouble notification tends to be transmitted simultaneously from a plurality of monitoring agent programs, the process at (2-1) abruptly increases.
The mere reduction of the monitoring cost can be achieved by increasing the monitoring interval to reduce the number of monitorings per unit time. When the number of monitorings is reduced, consumption of the calculation resources used in the monitoring process can be reduced, and thus, the monitoring cost can be reduced.
This method, however, has a disadvantage that monitoring capability is degraded. When the monitoring interval is lengthened, the trouble which occurs between monitorings cannot be found, and thus the detection of the trouble is delayed or the trouble cannot be detected. That is to say, the length of the monitoring interval and the detecting capability are in a trade-off relationship.
The method obtained by improving the method described above, in which the monitoring cost can be reduced and the detection delay can be prevented by dynamically adjusting the monitoring interval, is proposed in Japanese Patent Laid-Open Publication Nos. 2004-178118, 5-205074, 7-152706 and 8-275260.
In the technology in Japanese Patent Laid-Open Publication No. 2004-178118, there are a plurality of monitoring items monitored by the monitoring agent programs and when a monitoring interval of one of the items is shortened, the monitoring intervals of the other monitoring items are lengthened. As a result, the total increase in the monitoring cost is prevented.
In the technology in Japanese Patent Laid-Open Publication No. 5-205074, the monitoring manager program dynamically changes the interval at which measured data are collected from the monitoring agent programs so as to reduce the monitoring cost and the communication traffic by the monitoring agent programs. The monitoring interval is changed at the time when the measured data satisfy a predetermined condition.
In the technology in Japanese Patent Laid-Open Publication No. 7-152706, the monitoring agent programs measure utilization of CPU, and when the measured value is greatly changed from the previous measured value, the frequency of notification to the monitoring manager program is increased.
In the technology in Japanese Patent Laid-Open Publication No. 8-275260, when the data measured by the monitoring agent programs are not changed from the previously measured data, the measured data are not transmitted to the monitoring manager program.
The second objective of the performance monitoring system is the response to the change in configuration of the service system. The configuration of the service system is possibly changed during operation. For example, in the case where a defective information processing apparatus is disconnected from the service system, the configuration of the service system is changed.
Further, in recent years, a lot of systems called “work load management system”, which autonomously change the configuration of the service system are proposed. This system monitors a load on the service system, and increases or decreases the information processing apparatuses to or from the service system in accordance with the load amount.
When the configuration of the service system is changed in such a manner, the setting of the performance monitoring system should be also changed accordingly. A method for automatically changing the setting of the monitoring system according to the change in the configuration of the service system is proposed in Japanese Patent Laid-Open Publication Nos. 2000-92091 and No. 2003-271471.
In the technology in Japanese Patent Laid-Open Publication No. 2000-92091, information processing apparatuses constituting the service system are divided into some groups. Further, one or more monitoring manager programs are provided in one service system. Each of the information processing apparatuses has a table in which a correlation between the information processing apparatuses and the monitoring manager program is described. When the configuration is changed, for example, when the number of the information processing apparatuses increases or decreases or the number of the monitoring manager programs, namely, the number of the information processing apparatuses where the monitoring manager programs operate increases or decreases, the table is updated.
In the technology in Japanese Patent Laid-Open Publication No. 2003-271471, information processing apparatuses constituting the service system are divided into some groups. An address list of information processing apparatuses included in the group is created for each group, and all the information processing apparatuses hold the address lists of all groups. Further, all the information processing apparatuses hold a tree structure where related groups are connected by links. When the configuration of the service system is changed, the tree structures are sequentially referenced, and contents of the change in the configuration are transmitted to the information processing apparatuses in the respective groups, so that the change is reflected on the address lists and the tree structures.