This invention relates to a log analysis method or apparatus which monitors a system status or inquires into a problem by using a plurality of logs, and more particularly, to a method or an apparatus for correcting a time stamp recorded in a log by using a time correction log of an apparatus which has output the log, and a rule regarding time stamps of logs in order to correctly understand an event recorded in a log, and an order relation or correlation of monitor values etc., and to correctly determine the system status or a cause of the program.
Rendering services by using a cluster system which includes a plurality of computers such as a web 3-tier system (application system) has gained in popularity, increasing a necessity of continuously providing services stably by correctly understanding a progress status of a process carried out in the cluster system or an operation situation of the system. However, to understand the progress status of the process carried out in the cluster system including the plurality of computers or the operation situation of the system, only monitoring of an individual log output from each computer does not enable correct understanding of a situation of all the services or the entire system, so a plurality of logs output from the computers have to be analyzed en bloc.
However, there are a time lag among the computers and a shifting in recording timing among time stamps recorded in the logs. Thus, if the time stamp of each log is directly used for analysis, the analysis may not be correctly carried out. For example, a mistake such as changing of a processing order due to discordance among the logs, or appearance of a trend different from a true operation situation in statistical analysis result of the operation situation of the system due to a trend of shifting in time stamp among the logs may occur.
A true solution to the problem is elimination of the time lag among the computers to integrate timings of recording the time stamps in the logs. However, complete elimination of the time lag among the computers is difficult. Even if the time lag is completely eliminated, integration of timings of recording logs of all hardware and software components of the system is practically difficult, and analysis may not be correctly carried out if the time stamp recorded in the log is directly used for the analysis.
In response, a network time protocol (NTP) has been widely used to correct the time lag among the computers. Depending on environments, the time lag among the computers may be limited to several milliseconds to 100 milliseconds by using the NTP. However, for example, when a progress status of each process needs to be understood in detail, a method for accurately correcting time by milliseconds has to be provided. When time correction based on the NTP is highly frequently carried out by a large system, loads on a network increase. Depending on environments, the time lag of several tens of milliseconds still remains even if the time correction based on the NTP is highly frequently carried out, which may cause a problem of discontinuity of time caused by frequent time correction.
Concerning the problems, some conventional technologies have provided partial solutions.
JP 2005-235054 A entitled “Apparatus, Method, and Program for Correcting Time of Event Trace Data”, discloses a method for correcting a relative time lag of event trace data among a plurality of computers based on an event send/receive relation. This conventional technology discloses a method for correcting a time lag among a plurality of logs based on the event send/receive relation. However, when time correction of an NTP is carried out during a period of measuring correction amount of the time lag, causing discontinuity in the amount of time lag, the time lag cannot be correctly corrected. Because the amount of correction for time lag is measured for all sent/received events, costs of time correction are high.
JP 2006-285875 A entitled “Computer System, Log Collection Method, and Computer Program”, discloses a computer system which prepares a time difference table for storing a time difference between a virtual computer and a host computer, and corrects and takes out a time stamp of a log obtained from each virtual computer. This conventional technology discloses a method for updating a time difference between the computers at the time of changing time of each computer to correct a time stamp. However, time stamp correction is a time difference between the computers, and time stamp correction of a log caused by output timing of the log is not taken into consideration. Thus, consistency may not be obtained among the logs. When no correction of shifting other than at the time of changing time is carried out, and the shifting gradually enlarges, an error of time stamp correction amount gradually increases.
JP 2006-236251 A entitled “Time Stamp Apparatus, and Method and Program for Time Calibration”, discloses a method for improving time reliability of a time stamp apparatus by using both of an electric wave clock and a time issue server such as an NTP. This conventional technology has been developed to prevent time alteration of an ill-intentioned user. The conventional technology discloses the method for improving reliability of time itself of a time stamp. However, this method cannot completely integrate times of time stamps. Additionally, as in the case of JP 2006-285875 A, correction of a time stamp caused by log output timing has to be taken into consideration separately.
JP 11-27269 A entitled “Clock Synchronization Method, Device and Recording Medium”, discloses a method for carrying out time correction when, for clocks of a plurality of devices, a measuring result of each time difference is compared with statistical distribution of previous time differences, and there is a difference of a certain level or more. This conventional technology discloses a method for setting timing of correcting time of each device, but does not disclose time correction with consistency among logs taken into consideration.