The present invention relates to a computer apparatus and a method of diagnosing, and more particularly to a computer apparatus in which a hardware component may be replaced, repaired, or added with power on and a method of diagnosing that uses the computer apparatus.
As described in “PCI Hot-Plug Specification Revision 1.0”, Oct. 6, 1997 (Copyright(c) 1997, PCI Special Interest Group), a PCI card connected to the PCI bus on a today's computer apparatus may be removed, repaired, replaced, and then connected again for continued PCI card operation even during OS operation without turning off the computer power (hereinafter called “hot swapping”).
Recently, a computer, especially a server, is required to perform non-stop operation (24 hours a day, 365 days a year). Therefore, hot swapping described above makes it possible to repair or replace a hardware component, such as a PCI card, with the OS running and, then, to put the component into operation again under control of the OS again.
However, when a hardware component such as a PCI card is repaired or replaced with the OS running and then is put into operation again under control of the OS, there is no way to check if the repaired or replaced PCI card operates properly or if the card is mounted correctly. Whether or not the repair or replacement has been done successfully is not known until control is passed back to the OS and the component starts operation under control of the OS. For a server that is required to perform non-stop operation, it is dangerous to put a repaired or replaced PCI card under OS control while worrying whether it operates properly.
Similarly, a PCI card or a peripheral device may be added even during non-stop operation (hereinafter called “hot plugging”). However, after addition, it is dangerous to put the added PCI card or peripheral device under OS control without checking the operation.
To avoid such a danger, a PCI card is repaired, replaced or added with the power off. Then, after the hardware component is repaired, replaced, or added, the OS environment containing a test program for checking the hardware is started and, after confirming that the hardware operation is normal with the use of the test program, the power is turned off or the system is re-booted to start the main OS and to bring the system back into normal operation. The problem with this method is that the non-stop operation cannot be performed because this method requires the main OS to stop operation while the hardware component is repaired, replaced, or added and while the test/maintenance program checks the hardware operation.