The present invention relates to an apparatus, a recovery method and a program thereof. More particularly, it relates to an apparatus, a recovery method and a program thereof capable of automatically recovering a PCI (peripheral component interconnect) bus when a failure of the PCI bus occurs in a computer system.
In a related technology, there is known a computer system which has a recovery means established in order to recover a failure such as PCI bus failure in the computer system. The recovery of the PCI bus is executed by the cooperation between an OS (operating system) as driver, an ACPI (advanced configuration and power interface) and a BIOS (basic input/output system).
The failure recovery method in the related technology is described below. The failure recovery method is an OS-driven method.
(1) As a control operation of an OS, firstly, the OS detects that the failure of the PCI bus occurs.
(2) Next, as a recovery instruction to the ACPI from the OS having detected the failure, the OS issues a PCI recovery request to the ACPI after a control driver of the PCI card is stopped.
(3) Then, as a recovery instruction from the ACPI to the BIOS, the ACPI issues, to the BIOS, a reset request for the PCI bus by using a PMI (platform management interrupt or performance monitoring interrupt).
(4) Subsequently, as a PCI bus reset process in the BIOS, the BIOS resets the PCI bus and executes a recovery of the PCI bus.
(5) Thereafter, as a recovery result reporting process in the ACPI, the BIOS reports the recovery result to the OS via the ACPI.
(6) Lastly, as a driver re-installation process in the OS, upon receipt of the report that the PCI bus has restored successfully, the OS re-installs the control driver of the PCI bus and the device is started to use again.
Incidentally, it is well-known that the above-mentioned PMI is an interrupt function caused by the processor, and is an interrupt function for calling platform firmware.
In this field, for example, JP-A 2009-116642 (Patent Document 1) discloses a PCI-bus failure recovery method. In Patent 1, a recovery processing method including the following steps is described. PCI bus detection means detects a blocked PCI bus, and requests the OS to detach this blocked PCI bus and a PCI card connected to a PCI bus downstream of the blocked PCI bus. The OS detaches the requested PCI card from control and outputs an instruction for turning off the power of the PCI card to the BIOS. In response to this, PCI card detachment accepting means activates PCI bus diagnosis means. The PCI bus diagnosis means determines whether or not the blocked PCI bus can operate'normally, and if it can operate normally, PCI bus blockage clearance means clears the blockage of the PCI bus.
Meanwhile, JP-A Sho 60-191353 (Patent Document 2) discloses the following technique as a bus control system for controlling a common device. The system includes multiple processors, bus control circuits corresponding to the respective processors, and a common bus making connection with the common device shared by the processors. The technique has major points: providing each of the bus control circuits with a means for detecting that any of the processors halts control of the common device, and a means for generating a common bus occupancy cancellation signal when the halt continues for a certain time period; and a means for cutting off the connection between the processor and the common device in response to the common bus occupancy cancellation signal.
Further, JP-B 2938495 (Patent Document 3) discloses a technique for a network monitoring device in a network including multiple transmission devices, transmission paths connecting the transmission devices, and a monitor control device monitoring these paths. The technique is characterized in that two monitoring lines are provided between the monitor control device and a predetermined transmission device and the monitoring line to be used in polling is separated from the monitoring line to be used when a failure occurs.
However, in the related technology such as Patent 1, there is a problem that even when the OS issues a recovery instruction at a timing when the BIOS is temporarily unable to operate, such as during interrupt masking, the OS is forced to give up the recovery of the PCI bus.
Therefore, it is impossible to automatically recover the failure of the PCI bus in the related technology. As a result, at occurrence of a failure, a maintenance personnel needs to perform the operations such as detachment and re-installation of the PCI device, causing a problem of increasing failure recovery time.
Note that the techniques disclosed in Patent 2 and Patent 3 are merely related techniques.