As a computer system having high availability, a fault tolerant server (FT server) has been known. The approaches to realize an FT server include a hardware approach and a software approach.
In a hardware-based FT server, main hardware components such as a CPU, a memory, and a storage are made redundant. A module including a CPU, a memory, and a chip set is called a CPU subsystem, and a module including various IO devices is called an IO subsystem. CPU subsystems and IO subsystems are different in the duplexing method. In the CPU subsystems, operations of the hardware are completely synchronized in clock units. This is called lock-step. As both systems perform completely the same operations, when a failure occurs, the failed CPU subsystem is separated logically and the normal CPU subsystem is caused to continue operation. As such, CPU subsystems do not have the concepts of an active system and a standby system. Meanwhile, in IO subsystems, one is used as an active system and the other is used as a standby system, and duplexing control is performed by software. When a failure occurs in an IO subsystem of an active system, the failure is detected by software, and the operation is immediately switched to a standby system. A hardware-based FT server is able to realize extremely high availability. However, as it is configured of special hardware, it takes a higher introduction cost compared with a PC server of similar performance.
A software-based FT server uses a virtual technique which enables a plurality of OSs to operate on a physical computer. A computer virtually constructed on a physical computer is called a virtual computer or a virtual machine. In a software-based FT server, redundant physical computers are used, and a virtual computer of an active system and a virtual computer of a standby system are arranged on different physical computers, respectively. When a failure such as a hardware error occurs in the physical computer on which the virtual computer of the active system operates, the processing performed by such a virtual computer is continuously performed by the virtual computer of the standby system on the other physical computer. In order to continue the service transparently when viewed from the application and OS, the software-based FT server performs processing to match the states of the virtual computers of the active system and the standby system with each other, namely, synchronization.
Here, a software-based FT server will be described with reference to FIG. 12. Referring to FIG. 12, a physical computer 1010 and a physical computer 1020 are communicably connected with each other via a communication path 1030.
The physical computer 1010 includes a VMM (Virtual computer Monitor; also called as supervisor) 1011 providing a virtual computer environment, a virtual computer 1013 of an active system which operates under the virtual computer environment provided by the VMM 1011, and a guest OS (operating system) operates on the virtual computer 1013.
The physical computer 1020 includes a VMM 1021 providing a virtual computer environment, and a virtual computer 1023 of a standby system which operates under the virtual computer environment provided by the VMM 1021. It should be noted that as a guest OS does not operate on the virtual computer 1023 of the standby system, it is shown by a dashed line.
As described above, in the FT server, the states of the virtual computers 1013 and 1023 of the active system and the standby system are made to match with each other. As such, when a failure occurs in the physical computer 1010 or in the VMM 1011 so that operation of the virtual computer 1013 of the active system is not able to continue operation, the processing performed by the virtual computer 1013 of the active system can be performed continuously by the virtual computer 1023 of the standby system.
Among the various kinds of processing to synchronize the virtual computers 1013 and 1023, processing requiring the longest time is processing to match the content of the memory (guest physical memory). In a typical computer system, a memory is managed in units having a certain size which is called a page. Whether or not writing has been performed can be checked in page units. As such, processing to match memory content is performed in page units.
For example, in Non-Patent Document 1 shown below, contents of memories 1014 and 1024 are made to match with each other by the following method (here, it is called a bulk copy method). Specifically, when a checkpoint comes, the virtual computer 1013 of the active system is suspended so as to interrupt update to the main memory 1014, and all of the pages (dirty pages) on the memory 1014, which have been updated after the previous checkpoint, are copied to a transfer buffer 1012. Then, upon completion of the local copy, the suspended virtual computer 1013 of the active system is restarted, and along with it, the dirty pages copied to the transfer buffer 1012 are transferred to the physical computer 1020 of the standby system. Thereby, the VMM 1021 in the physical computer 1020 copies the dirty pages, transferred from the physical computer 1010, to the memory 1024 of the virtual computer 1023 of the standby system so as to make the contents of the memories 1014 and 1024 match with each other.
Meanwhile, a method called copy on write (COW) is also known. In the COW method, when a checkpoint comes, the virtual computer 1013 of the active system is suspended so as to interrupt update to the main memory 1014, and by setting a write inhibit flag to an entry related to a dirty page of the page table, for example, writing to the dirty pages is inhibited. When writing to all of the dirty pages is inhibited, the virtual computer 1013 of the active system is restarted, the dirty pages on the memory 1014 are copied to the transfer buffer 1012, and the write inhibit is released. Then, in parallel with the copying to the transfer buffer 1012, the dirty pages copied to the transfer buffer are transferred to the physical computer 1020. It should be noted that if a write request to the dirty page, to which writing is inhibited, is made (if a page fault exception occurs), the virtual computer 1013 of the active system is suspended, and the dirty page to which a write request has occurred, is copied to the transfer buffer, and then the write inhibit is released. Then, the virtual computer 1013 of the active system is restarted, and in parallel with it, the dirty pages copied to the transfer buffer 1012 are transmitted to the physical computer 1020.
Non-Patent Document 1: Brendan Cully, and 5 others, “Remus: High Availability via Asynchronous Virtual Machine Replication” [online], [searched on Mar. 5, 2013], Internet, <URL: http://nss.cs.ubc.ca/remus/papers/remus-nsdi08.pdf>
As described above, as methods of making the memory content of a virtual computer of an active system and the memory content of a virtual computer of a standby system match with each other, there are two types of methods namely a COW method and a bulk copy method.
While there is a case where the suspended period of a virtual computer is reduced if a COW method is used, there is also a case where the suspended period of a virtual computer is reduced if a bulk copy method is used. As such, in a system that a method to be used for the entire memory is fixed to either a COW method or a bulk copy method, it is difficult to reduce the suspended period of a virtual computer.