1. Field of the Invention
The present invention relates to a technique of saving error analysis information upon occurrence of error, and more particularly, to a method of saving software error analysis information appropriate for a distributed data processing system comprising a plurality of servers, and a saving system for saving error analysis information.
2. Description of the Related Art
A group of software products (programs), which run on a distributed data processing system comprising a plurality of servers, realize large-scale system operation by execution of data processing by the plurality of servers in cooperation with each other, as well as local data processing within each server. In this distributed data processing system, if a software error occurs in one server when data processing is performed, error analysis information of the server is occasionally insufficient to find the cause of the error. Further, since almost all the error analysis information necessary for investigation into the cause of error such as trace information of a software product is managed by a wrap around function, if saving of error analysis information into a saving file is delayed, the important information may be lost.
Generally, when a system error including a software bug occurs, an error alarm message notifying the occurrence of error is displayed on a monitoring terminal apparatus. Conventionally, when an error alarm message is displayed on the display of the monitoring terminal, an operator of the monitoring terminal notifies a system administrator (otherwise, a manufacturer or the like) of the occurrence of error. Then, the system administrator checks the content of the error and the server where the error has occurred (hereinafter, referred to as a xe2x80x9ctroubled serverxe2x80x9d), and starts to collect error analysis information necessary for investigation into the cause of the error.
If an error occurs when data processing is performed by a plurality of servers in cooperation with each other, it is necessary to collect error analysis information from not only the troubled server but also the other servers than the troubled server. In this case, it is necessary to specify servers from which error analysis information is to be collected, and instruct these servers to quickly save error analysis information or transfer the information to the monitoring terminal apparatus.
As a conventional technique for collecting log data from a plurality of computers, Japanese Published Unexamined Patent Application No. Hei 5-250229, for resolving the problem of an increase in load caused by automatic log-data transmission request to all the computers, proposes to analyze log data transmitted from respective computers, and to require a computer, in which an error has been found in its log data, to transmit log data next time, while not to require transmission of log data of a computer, in which no log data error has been found, so as to reduce log-data transmission load on normally-operating computer. However, in the above conventional art, as the other computers in cooperation with the troubled computer are excused from the next log-data transmission as long as these computers normally perform data processing, the conventional art cannot be applied to the distributed data processing system requiring error analysis information also from normal computers.
In the conventional method of collecting error analysis information, upon occurrence of system error, a specific server is instructed to save its error analysis information by judgment of an operator of a monitoring terminal apparatus or a system administrator. As the analysis information saving is made based on the judgment and instructive operation from the monitoring terminal by an operator or a system administrator, it takes much time to save and collect the error analysis information, which might lose important information.
Accordingly, an object of the present invention is to provide an error analysis information saving method which automatically saves error analysis information without the judgment and terminal operation by an operator or a system administrator.
Further, another object of the present invention is to provide an error analysis information saving method which automatically saves error analysis information of a server when an error is detected in the server, and further saves error analysis information of other servers related to the error.
Further, another object of the present invention is to provide a distributed data processing system and a computer network which automatically and quickly save error analysis information stored in a plurality of servers when an error is detected in one of the servers.
Further, another object of the present invention is to provide a monitoring terminal apparatus which, when an error is detected in one server, automatically specifies error analysis information of other servers related to the error, and instructs a plurality of servers to save error analysis information.
The foregoing objects are attained by providing a computer network having a monitoring apparatus, comprising: a plurality of servers connected to the monitoring apparatus via a communication network; wherein the plurality of servers respectively have a plurality of software programs, and have a data file for storing error analysis information to be utilized for investigation into a cause of an error upon occurrence of the error, for each software program, and means for transmitting an error notifying message, including an identifier of a software program executed when the error has occurred, to the monitoring apparatus; and wherein the monitoring apparatus has means for instructing at least one server, specified based on the software identifier included in the error notifying message, to save the error analysis information, in response to one error notifying message received from any one of the servers.
Further, the foregoing objects are attained by providing a distributed data processing system comprising: a plurality of servers respectively having a function to execute data processing in cooperation with each other via a communication network; and a monitoring terminal apparatus connected to the communication network, wherein the monitoring terminal apparatus has a management table containing a plurality of data records, each having an index code including a software identifier and at least one set of resource definition data defining resources related to saving operation of error analysis information, and wherein when the monitoring terminal apparatus receives an error notifying message including the software identifier from any one of the servers, the monitoring terminal apparatus instructs a server, defined by a data record corresponding to the software identifier as one of the resources, to save the error analysis information.
More specifically, each data record in the management table includes at least one set of resource definition data defining a server to perform saving operation on error analysis information, a data file including the error analysis information and an output file where the error analysis information is saved, and the monitoring terminal apparatus designates the data file and the output file defined by the resource definition data, and instructs the server to save the error analysis information.
According to a preferred embodiment of the present invention, at least one of the data records stored in the management table includes plural sets of resource definition data corresponding to one index code, and the monitoring terminal apparatus instructs a plurality of servers, defined by the plural sets of resource definition data, to save the error analysis information, in response to reception of one error notifying message.
Further, according to the preferred embodiment of the present invention, the index code of each data record stored in the management table includes an additional code accompanying the software identifier, indicative of error type, and the error notifying message transmitted from the server has a message identifier including the software identifier and the additional code indicative of the type of an error detected in the server, further, when the monitoring terminal apparatus receives an error notifying message from any one of the servers, the monitoring terminal apparatus searches the management table based on the message identifier of the received message, and instructs saving of the error analysis information if it is determined that a specific type of error has occurred in a specific software program designated in advance in the management table.
In accordance with the present invention, provided is a monitoring terminal apparatus connected to a plurality of servers via a communication network, comprising: a management table containing plural sets of resource definition data defining resources related to saving operation of error analysis information corresponding to a plurality of index codes each including a software identifier of a software program executed on each of the servers; means for, when an error notifying message including the software identifier is received from any one of the servers, searching the management table for at least one set of resource definition data corresponding to the software identifier included in the error notifying message; and means for transmitting a control message instructing to save the error analysis information to a server defined by the searched resource definition data as one of the resources.
Further, in accordance with the present invention, provided is an error analysis information saving method in a distributed data processing system comprising a plurality of servers which perform data processing in cooperation with each other via a communication network and a monitoring terminal apparatus connected to the communication network, wherein the monitoring terminal apparatus has a management table for storing a plurality of data records each comprising a software identifier of a software program and at least one set of resource definition data defining resources related to saving operation of error analysis information, the method comprising the steps of: transmitting an error notifying message including the software identifier of a software being executed, from one of the plurality of servers where a software error has been detected during data processing, to the monitoring terminal apparatus; upon reception of the error notifying message, referring to data record corresponding to the software identifier of the error notifying message stored in the management table, by the monitoring terminal apparatus; instructing a server, defined by the referred data record as one of the resources, to save the error analysis information, from the monitoring terminal apparatus; and performing saving operation on the error analysis information by the server instructed to save the information.
More specifically, the saving instruction to save the error analysis information is made by designating a source file and an output file defined by the data record as one of the resources, and instructing the server to save the error analysis information, in response to reception of the error notifying message, by the monitoring terminal apparatus. In the preferred embodiment of the present invention, the monitoring terminal apparatus refers to a plurality of data records corresponding to the software identifier of the error notifying message stored in the management table, in response to reception of the error notifying message, and instructs a plurality of servers, defined by the referred plural data records, to save the error analysis information.
Other features and advantages of the present invention will be apparent from the following description taken in conjunction with the accompanying drawings, in which like reference characters designate the same name or similar parts throughout the figures thereof.