1. Field of the Invention
The present invention relates to a method, a system, and a program for correcting time of event trace data, and more specifically to a performance analysis to conduct an analysis on performance by using, as an input, trace data collected by a plurality of machines to be measured in almost the same time band and by utilizing event information about communication operations to be performed between machines being stored in the trace data.
The present application claims priority of Japanese Patent Application No. 2006-003626 filed on Jan. 11, 2006, which is hereby incorporated by reference.
2. Description of the Related Art
One example of a conventional system for collecting event trace data is disclosed in Japanese Patent Application Laid-open No. 2003-157185. As shown in FIG. 16, the conventional system to collect event trace data is made up of a trace data storing section 213, a trace data storing medium 202, and a probe included in an operating system 210.
In the conventional system to collect event trace data, time of occurrence of an event (represented as a time stamp) is obtained by using a clock 201 provided as hardware of the system to be measured. Since operations of the clocks in the machines vary slightly from machine to machine, if event trace data is collected from a plurality of machines to be measured, as light deviation (in order of 10−5 to 10−6) occurs in operations of the time stamp (time corresponding to “1” indicated by the time stamp value) between the time stamps.
A technology is disclosed in Japanese Patent Application Laid-open No. 2005-235054 in which, when performance is analyzed by using, as inputs, a plurality of trace data blocks, a deviation in time when collection of the trace data was started is corrected. According to this technology, since the deviation in time of starting collection of event trace data is corrected based on communication operations to be performed between machines to be measured, it is necessary that a transmitting event occurring in one machine and a receiving event occurring in another machine correspond exactly to each other. However, the conventional technologies have a problem. That is, if time for the collecting event trace data is made longer, as shown in FIGS. 17 and 18, a transmitting event corresponds exactly to a receiving event in a first part of the period (in the first half of the period) of collecting the trace data, however, in the latter half of the period, exact correspondence between the transmitting event and the receiving event is made impossible, that is, on a time stamp, the receiving event occurs earlier than the transmitting event, thus making it impossible to make a correction to the time of starting collection of the trace data.
Moreover, another problem arises in the conventional technologies in that, when a required period of time is analyzed, a required period of time for communications can be obtained in the first part (first half of the period) of the period of collecting trace data, however, as time elapses, the required period of time for communications becomes incorrect.
In other words, when the event trace data is collected in the plurality of machines at the same time and an analysis is conducted by considering a correspondence relation in communications to be carried out between machines, since operations of clocks (clock speeds) differ slightly from machine to machine, if communications between machines is to be extracted from the time stamp contained in the trace data, a relation between the transmitted time and the received time is out of order between in the first half and in the latter half of the period of collecting trace data and, therefore, exact extraction of communications between machines becomes impossible.
This problem is explained by referring to FIG. 17. FIG. 17 shows a time chart used to explain communications to be carried out between two machines in which a slanting arrow shows communications between machines. A root of the arrow corresponds to transmitting operations and a tip of the arrow corresponds to receiving operations. It is here assumed that a clock embedded in the machine 2 goes 0.0001 times faster than a clock in the machine land it is also assumed, to simplify the explanation of the operations, that operations of both the clocks coincide with each other. A value shown by the clock in the machine 1 is called an “absolute time”. That is, the time value read by the clock in the machine 2 is 1.0001 times larger than the absolute time value. As shown in FIG. 17, it is also assumed that a signal is transmitted from the machine 1 to the machine 2 when the absolute time is “1” and “101” and a signal is transmitted from the machine 2 to the machine 1 when the absolute time is “2” and “102” and the absolute time of its required period of time for communications is “0.001”.
A signal transmitted by the machine 1 at absolute time “1” is received by the machine 2 at absolute time “2”. The time when the signal was received by the machine 2 is “1.001001” being “1.0001” times larger than “1.001”. On the contrary, a signal transmitted by the machine 2 at absolute time “2” (at time “2.0002” by the machine 2) is received by the machine 1 at absolute time “2.001”. Similarly, a signal transmitted by the machine 1 at absolute time “101” is received by the machine 2 at absolute time “101.001” (clock in the machine 2 reads “101.0111001”) and a signal transmitted by the machine 2 at absolute time “102” (clock in the machine 2 reads “102.0102”) is received by the machine 1 at time “102.001”.
Here, it is assumed that data on an event trace including transmitting and receiving operations is collected by the machines 1 and 2. Configurations of the machine 1 from which the event trace data is collected are shown in FIG. 16. As shown in FIG. 16, the trace data is stored by the trace data storing section 213 into the trace data storing medium 202 (for example, in area assigned in a memory of the machine). The trace data storing section 213 is invoked every time when probing is performed by the probe embedded in the operating system 210 and event data corresponding to the probe is stored in the trace data storing medium 202. The probe installed in the machines includes also probes installed in a data receiving section 211 and a data transmitting section 212. When the data receiving operation and data transmitting operation are performed, the trace data storing section 213 is invoked and event data showing that the receiving and transmitting operations have been performed is stored in the trace data storing medium 202. The event data contains information about types of events and time stamp information showing time of occurrence of an event and a time stamp is obtained from the clock 201 in each machine. The reason why absolute time is not used as a time stamp is that there is a difficulty in using absolute time as the time stamp or that practicality as an event trace is lost due to overhead occurring when absolute time is acquired.
FIG. 18 is a result from calculation processes in which time required for the transmitting and receiving operations in each communication is extracted from the event trace data collected from the two machines and a required period of time for communications (=received time-transmitted time) is calculated based on a time stamp in which trace data is stored. The calculated required period of time for communications carried out at absolute time 102 through 102.001 is a negative value. That is, an apparently contradictory phenomenon occurs that receiving operations are performed before transmitting operations are performed. Moreover, in the other three communications, apparently, a require period of time for communications changes with a lapse of time. As a result, a problem arises that, even when an analysis on communications between machines is conducted by using, as input data, the trace data as described above, it is impossible that an exact result is obtained.