The present invention relates to a trace information collecting system, a trace information collecting method, and a trace information collecting program, more particularly to a trace information collecting system, a trace information collecting method, and a trace information collecting program which collect trace information indicating the course of processing of a program.
If a failure has occurred in a computer system being operated in a company or other entity, processing is advanced, for example, in the following flow in order to remove the failure.
(1) A failure has occurred in the computer system being operated.
(2) A program module which has caused the failure is specified by analysis by a person in charge of the system.
(3) Trace information indicating the course of processing of the program module is acquired during the next running time.
(4) The acquired trace information is analyzed.
(5) A cause of the failure is investigated.
The trace information is successively produced and output during the operation of the computer system. Therefore, if the trace information is to be acquired during the operation of the computer system, processing to store the trace information in a storage device periodically occurs so that a processing speed of original processing is lowered. Therefore, heretofore, no trace information has been acquired during the normal running time, and in many cases the person in charge of the system changes setting in such a manner as to acquire the trace information only when a failure has occurred. Additionally, in order to change the setting of the computer system being operated, approval of a user or an owner should be obtained, which would take much time in many cases.
On the other hand, in recent years, the computer system has played an indispensable role in a key business in addition to improvement of efficiency in office works in the company. Therefore, the computer system is not permitted to be stopped for a long time in order to investigate a cause of a failure occurring in the computer system. Therefore, even if the failure has occurred, the computer system is instantly restarted, and in many cases it is not possible to secure a time for obtaining the approval to acquire the trace information. Thus, in many cases, it has been difficult to remove the failure from the computer system being operated.
As a prior art technology, Japanese Published Patent Application 5-257758 discloses a system for automatically changing settings to acquire the course of processing of a transaction for a database when the transaction fails, so that the transaction is retried. According to this system, the cause of the failure which has occurred during the processing of the transaction can be easily investigated.
It is considered that the technology described in the above-described document is applicable to a failure of a computer system because the trace information can be automatically acquired. However, in the computer system, unlike the transaction, it is often difficult to determine whether or not a failure has occurred. For example, a failure in the transaction can be easily detected by an error code or the like recorded in a predetermined storage area. On the other hand, in the computer system, a failure often occurs due to composite factors of various modules, and an abnormal operation may occur even if each of the modules is normal. Furthermore, there may be a case in which processing is interrupted for some operational reason even if an actual failure does not occur. Thus, it is difficult to determine whether a failure has actually occurred in the computer system and, hence, to determine appropriate timing for producing trace information.