In order to reduce the costs for Information Technology (IT) infrastructures, the use of virtualization technologies have been demanded for more efficient use of resources and reduction of operation costs. Virtual machine systems have begun to be adopted in which one physical server is used to host a plurality of virtual servers.
For example, a business system operating on many physical servers may be virtualized by using virtualization software to intensively implement the system on fewer physical servers. This may reduce the adoption and operation costs for the physical servers.
Such a virtual machine system implements a plurality of virtual server functions with a plurality of virtual machines (guest OSs) controlled by a host OS. A failure of a virtual machine in the virtual machine system however may possibly terminate the system.
Real machine systems use clustering technologies (or clustering systems) by which when a system terminates, the system is replaced by another equivalent system to transfer its jobs for higher reliability and higher usability. In other words, in order to prevent termination of jobs, a clustering system is used for redundant machine systems.
On the other hand, virtual machine systems using clustering systems have been proposed in which virtual machines executing jobs are clustered for redundancy, like physical servers, and a host OS controlling the virtual machines is clustered to recover the virtual machines.
FIG. 10 is an explanatory diagram of a virtual machine system.
As illustrated in FIG. 10, the virtual machine system includes in a real machine system 100 a virtual machine manager (virtualization software) 120, a host operating system (OS) 122, and a plurality of virtual machines (guest operating systems (OSs)) 124 and 126. The host OS 122 is provided for controlling the guest OSs 124 and 126. The virtual machine manager 120 virtualizes and controls the host OS 122 and guest OSs.
In such a virtual machine system, the termination of one guest OS (virtual machine) due to its failure prevents the continuous operation of a business application being executed by the guest OS. In order to avoid the situation, as illustrated in FIG. 10, an equivalently configured real machine system 102 is provided. Then, data are exchanged through a shared disk device 104. This is called a clustering-type hot standby/cluster system.
In other words, the second real machine system 102 having the virtual machine manager 120 and the host OS 122 is connected to the first real machine system 100 over a local area network (LAN) 106 and over a heart beat network 108 for failure notifications.
The LAN 106 connects to a plurality of terminals 110. Thus, even when a guest OS (such as the guest OS 126) in the first real machine system 100 fails and terminates, the guest OS 126 is started under the control of the host OS 122 in the second real machine system 102 to resume the job of the guest OS 126.
However, such clustering system makes the host OS and virtual machines redundant. This requires a standby system and a shared disk device separately from the operating system, which may increase the costs for the systems. It further requires clustering system software 130 and setting the software.
The clustering system stores in a disk a snapshot that is the contents of the memory and/or disk of a guest OS when its service starts upon completion of the start of a job or application for omission of the time for starting, the guest OS and application and reduction of the recovery time. However, since the snapshot upon start of the service is only available, the guest OS may only be recovered from the time when the job or service by the guest OS starts. This prevents users from starting the job from an arbitrarily designated recovery point.
The use of data replication cluster in which data is replicated with a local disk for synchronizing files (OS Images) may cause differences (or contradictions) in contents of the files when one machine system shuts down during the synchronization, for example. This may contrarily increase the time for recovering the job.    [Patent Document 1] Japanese Laid-open Patent Publication No. 2008-052407    [Patent Document 2] Japanese Laid-open Patent Publication No. 11-134117