Various methods are proposed to handle abnormal states occurring in information processing apparatuses.
For example, Japanese Unexamined Patent Application Publication No. 7-36721 discloses a control method for taking data of a process from a processing apparatus in the current operation system over to a processing apparatus in a new operation system if a fault occurs in the processing apparatus in the current operation system, to continue the operation in the processing apparatus in the new operation system. In the control method, data used for investigating the cause of the fault is stored in a common storage device for fault analysis, and the data is referred to at switching to the new operation system to find the cause of the fault.
Japanese Unexamined Patent Application Publication No. 7-262034 discloses a data handover system in which non-volatile storage connected to a first data processing apparatus and a second data processing apparatus is provided, and handover data is stored in the non-volatile storage to perform the data handover at switching between a active system and a standby system.
In an information processing apparatus as an entire computer system, a server controlling the entire system, for example, a service processor, is provided. The service processor unit itself or firmware operated in the service processor unit is hereinafter referred to as an eXtended System Control Facility Unit (XSCFU). The XSCFU has a function for monitoring fault occurred in the hardware including the XSCFU in the control of the entire system.
An asynchronous communication method is used for error notification in the fault monitoring. In asynchronous communication, since an information source does not wait for completion of process at the information destination, the asynchronous communication method has the advantage in that multiple processing requests may concurrently be submitted or other process may be performed concurrently with the processing requests.
However, when the asynchronous communication method is used in the error notification, the following problems may occur. If requested process is interrupted and is not completed because of a fault occurred in the XSCFU, for example, because of a fault occurred in an information communication path or an internal fault occurred in an information destination, the information source cannot detect that the processing requested of the information destination has failed. In this case, the processing requested of the information destination may not be completed and the information source cannot re-request the process of the information destination.
The XSCFU manages common data including non-volatile data and volatile data in order to control the entire system. Of the common data managed by the XSCFU, the non-volatile data includes a variety of setup information and degeneracy information about portion of the information processing apparatus during the operation of the system and is held even if the information processing apparatus is turned off. The volatile data includes the latest state of the system and is held only in a state where the information processing apparatus is turned on. If an abnormal state in the XSCFU is detected and the process requested by the information source is interrupted, update of the common data that should be performed by the information source is not undesirably performed.