1. Field of the Invention
The present invention relates to a computer system in which an operating system (OS) that enables hot-plugging of a PCI device runs, more particularly to a computer system that uses a control program for enabling hot-plugging of a PCI device to be inserted/ejected in/from an object logically.
2. Description of Related Art
How to cope with errors efficiently in computer systems is now an important issue for realizing such nonstop long time operations as 24-hour 365-day services. As described in “TRANSACTION PROCESSING: CONCEPTS AND TECHNIQUES” written by JIM GRAY, the rate of software errors is increasing year by year recently. This is why appearance of countermeasures that can cope with those software errors is expected strongly now.
Memory leaks and application software bugs have been considered as causes of the above-described software errors. Those memory leaks and application software bugs are often resulted from a memory area that is kept occupied after it is used for processings of an operating system (OS) and/or application software programs. So far, there has been proposed some methods to eliminate such software errors. One of the methods restarts services and the subject OS operation periodically with use of a computer system management software program. However, the method must stop those services during such a restarting operation. This has been a problem.
To solve such a problem, there has also been proposed another method that form a cluster with a plurality of servers physically with use of a clustering program and perform fail-over processings for such services as those of Web servers and DBMs (Database Management Systems).
The method forms a cluster with an active server that provides services and a standby server that provides no service. And, the method enables message communications referred to as heart beats to be made between those servers and time stamps to be written periodically in a shared disk so as to check each of the active and standby servers for error occurrence. If the heart beat stops or the time stamp in the shared disk is not updated properly, it is decided as error occurrence, so that the service executed in the error-detected active server is succeeded by and restarted in the normal standby server (fail-over processing).
The method for forming a cluster with such physical servers requires (1) preparing a plurality of computers physically, (2) providing the system additionally with a router device and a network interface card used for the heart beats so as to form a network dedicated to the heart beats, and (3) preparing a common data disk used to execute the same services in the plurality of servers.
If only software errors are to be targeted, it is possible to solve the above problems with use of virtual machines. In the case of the above problem (1), the official gazette of JP-A No.288590/9 discloses a well-known method that forms a cluster only with virtual machines. In that connection, a plurality of virtual machines are operated in one physical computer to multiplex an operating system/application software program, thereby coping with the above-described software errors.
The official gazette of JP-A No.85547/11 also discloses another well-known method employed for the above problem (2). The method realizes communications between processes with communications between virtual machines, which uses a main memory. The method enables a cluster of virtual machines to be formed without using any of such hardware as a router and a network card for communications among the virtual machines.
To cope with the above problem (3), still another well-known method is used. The method enables a data disk to be shared among clustered computers with use of a disk unit (multi-port disk, multi-port RAID, or the like) provided with a plurality of such interfaces as SCSI ones for the connection between each server and the disk.
[Patent Document 1]
Official gazette of JP-A No.288590/9
[Patent Document 2]
Official gazette of JP-A No.85547/11
[Non-patent Document 1]
“TRANSACTION PROCESSING: CONCEPTS AND TECHNIQUES” written by Jim Gray and Andreas Loiter, published by MORGAN KAUFMANN PUBLISHERS, PP.10-103
If a multi-port disk unit is shared among a plurality of virtual machines as described in the above conventional examples, however, the system manufacturing cost increases, since the multi-port disk drive (or I/O device) is expensive. This has been another problem.
Furthermore, in the above conventional examples, both active and standby virtual machines can access the shared multi-port disk unit freely, so that the standby virtual machine can access the disk unit even when a software error occurs therein. Consequently, the access to the disk unit from the active virtual machine comes to be affected adversely due to the unnecessary access from the error-occurred standby virtual machine. This has been still another problem.
Under such circumstances, it is an object of the present invention to provide a computer system that enables fail-over processings so as to improve the reliability and reduce the manufacturing cost of the system at the same time while the system employs a low-price single port I/O device.