1. Field of the Invention
The present invention relates to a technique of saving error analysis information upon occurrence of error, and more particularly, to a method of saving software error analysis information appropriate for a distributed data processing system comprising a plurality of servers, and a saving system for saving error analysis information.
2. Description of the Related Art
A group of software products (programs), which run on a distributed data processing system comprising a plurality of servers, realize large-scale system operation by execution of data processing by the plurality of servers in cooperation with each other, as well as local data processing within each server. In this distributed data processing system, if a software error occurs in one server when data processing is performed, error analysis information of the server is occasionally insufficient to find the cause of the error. Further, since almost all the error analysis information necessary for investigation into the cause of error such as trace information of a software product is managed by a wrap around function, if saving of error analysis information into a saving file is delayed, the important information may be lost.
Generally, when a system error including a software bug occurs, an error alarm message notifying the occurrence of error is displayed on a monitoring terminal apparatus. Conventionally, when an error alarm message is displayed on the display of the monitoring terminal, an operator of the monitoring terminal notifies a system administrator (otherwise, a manufacturer or the like) of the occurrence of error. Then, the system administrator checks the content of the error and the server where the error has occurred (hereinafter, referred to as a “troubled server”), and starts to collect error analysis information necessary for investigation into the cause of the error.
If an error occurs when data processing is performed by a plurality of servers in cooperation with each other, it is necessary to collect error analysis information from not only the troubled server but also the other servers than the troubled server. In this case, it is necessary to specify servers from which error analysis information is to be collected, and instruct these servers to quickly save error analysis information or transfer the information to the monitoring terminal apparatus.
As a conventional technique for collecting log data from a plurality of computers, Japanese Published Unexamined Patent Application No. Hei 5-250229, for resolving the problem of an increase in load caused by automatic log-data transmission request to all the computers, proposes to analyze log data transmitted from respective computers, and to require a computer, in which an error has been found in its log data, to transmit log data next time, while not to require transmission of log data of a computer, in which no log data error has been found, so as to reduce log-data transmission load on normally-operating computer. However, in the above conventional art, as the other computers in cooperation with the troubled computer are excused from the next log-data transmission as long as these computers normally perform data processing, the conventional art cannot be applied to the distributed data processing system requiring error analysis information also from normal computers.
In the conventional method of collecting error analysis information, upon occurrence of system error, a specific server is instructed to save its error analysis information by judgment of an operator of a monitoring terminal apparatus or a system administrator. As the analysis information saving is made based on the judgment and instructive operation from the monitoring terminal by an operator or a system administrator, it takes much time to save and collect the error analysis information, which might lose important information.