Recent improvement in performance of an IT system largely depends on mounting of multiple processors on one system (multiprocessor configuration), rather than improvement in performance of one processor. Such a multiprocessor system simultaneously executes a plurality of processes with a plurality of processors. On the basis of this, improvement in performance as a system is intended. However, in the multiprocessor system, the plurality of processes executed by the plurality of processors may require the use of a shared resource at the same time. That is, contention for the shared resource (hereinafter also referred to as shared resource contention) may occur. When the shared resource contention occurs, improvement in performance corresponding to the number of processors may not be achieved. For this reason, behavior of the IT system is measured and analyzed when the IT system execute a program. It is checked whether or not the shared resource contention occurs, and when the contention occurs, it is further checked which part the resource contention occurs in. To check them is an important key to the improvement in performance of the IT system.
Methods for measuring behavior of a computer are roughly classified into an event driven method and a sampling method. In the event driven method, an event (process switching, start and end of an I/O process, communication, or the like), which occurs in a measuring object computer executing a program, is used as a trigger to execute a measurement operation. As one type of the event driven method, there is a method referred to as an event trace method. In the event trace method, information about an occurred event is recorded as time-series data (trace data), and a final analysis result is obtained by analyzing the information later.
Between the event trace method and an event driven method other than the event trace method, there are large differences in measurement data amount, applicable algorithm type, whether or not various algorithms are repeatedly applicable; however, it may be thought that a mechanism to obtain an analysis result is the same in both of the methods. That is, the both methods are common to each other in that events occurring in a measuring object system and algorithms defining analysis processes corresponding to the events are applied to the measuring object system. The difference is in timing to apply the algorithm (whether the algorithm is applied to trace data obtained as a result of measurement or the algorithm is applied while executing measurement). For this reason, in the following, the event trace method is taken up to be described.
In the event trace method, in order to obtain a meaningful result from trace data as a measurement result, there is required an algorithm that processes the trace data. As such algorithm, various algorithms are prepared depending on an object of an analysis; however, an algorithm that executes an analysis on the shard resource contention has not been present so far.
On the other hand, in the sampling method, a state of a measuring object system is checked at a constant time interval, and the states are summarized along time course to thereby check behavior of the system from a global viewpoint. This method is suitable for the purpose of understanding an outline of system operation over a long period of time, but not suitable for the purpose of measuring and analyzing microscopic operation such as the shared resource contention.
As described above, in the conventional technique, there has been no means for measuring and analyzing behavior with respect to the shared resource contention in an IT system having a multiprocessor configuration.
Techniques related to monitoring, check and analysis are introduced below.
Japanese patent publication (JP-A-Showa 60-11948) describes a task state transition monitoring device. The task state transition monitoring device includes a real time parallel processing unit and a data display processing unit. The real time parallel processing unit includes a management unit having functions to output a task identification code, a cause of task state transition and a time of the task state transition of a task defined in a real time parallel processing program. The data display processing unit inputs the information outputted from the management unit, converts the task identification code into a user registration task name, and causes a data display unit to display the user registration task name, a time of execution of the parallel processing program, a task state correlating the user registration task name to the time of execution of the parallel processing program, and the cause of the task state transition.
Japanese patent publication (JP-A-Heisei 5-346861) describes a multitasking software inspection device. The multitasking software inspection device inspects switching control operation in which a multi-task program having a plurality of tasks is executed in parallel by using a scheduler. The inspection device includes a detection unit, a storage unit, an inspection unit, and an output unit. The detection unit detects a switching due to a switching control of the scheduler. The storage unit stores switching data detected by the detection unit. The inspection unit inspects whether the executed operation of the multi-task program satisfies a specification or not by comparing and collating the switching data stored in the storage unit with inputted specification data about the switching control. The output unit outputs a detection result of the detection unit and an inspection result of the inspection unit.
Japanese patent publication (JP-A-Heisei 4-76640) describes a system for analyzing task break time during on-line operation. In the task break time analysis system, a temporal break of task processing in a running state is executed by an interruption by time slice/memory hold or the like; a transition of the task to a ready state is executed after the completion of the interruption process; and a recovery to the running state is executed by a task dispatcher. The task break time analysis system includes: a state flag indicating whether or not on-line processing performance is being measured; a calculation unit holding a task ID of a measuring object; and first, second, and third information collection process routines. When an interruption occurs during running of an on-line task, in a case that the interruption is an interruption for the measuring object task, an interruption analysis routine activates an interruption execution routine after setting the task ID, starting a timer for displaying elapsed time after the interruption, and activating the first information collection process routine to collect information such as the task ID, a time of occurrence of the interruption, and an interruption cause type such as the time slice/memory hold, whereas in other cases, the interruption analysis routine directly activates the interruption execution routine. In a case that a task ID of a source of occurrence of the interruption is set and a value of the above timer exceeds a predetermined value at a completion of a predetermined interruption process, the interruption execution routine activates the second information collection process routing to collect the task ID, a current time, and the like, and then passes control to the task dispatcher, whereas in other cases, the interruption execution routine directly passes the control to the task dispatcher. The task dispatcher checks, at a batch moment to a task, whether or not an ID of dispatch destination task and the task ID of the interruption occurrence source coincide with each other. In a case that those coincide with each other, when the value of the timer exceeds the predetermined value, the task dispatcher performs dispatch as usual after calling the third information collection process routine to collect an execution priority of the dispatch destination task, task ID, the current time, and the like, initializing the set task ID of the interruption occurrence source, and stopping the above timer, whereas in the other cases, the task dispatcher simply performs dispatch as usual.