Technical Field
The present invention relates to the technical field of computers, and in particular, to a packet-aware fault-tolerance method and system for virtual machines applied to cloud services, a computer readable record medium and a computer program product.
Related Art
With extensive applications of the virtualization technology, virtual machines hosting various Internet services are deployed on a cloud compute pool. A fault-tolerance service based on the virtualization technology plays an important role in protecting mission-critical services, because it can let the services provide uninterrupted operation without users' awareness.
A fault-tolerance technology for virtual machines is mainly based on backup virtual machines. That is to say, a virtual machine as a backup is executed on one physical host, while the primary virtual machine that hosts mission-critical services is executed on another physical host, and both maintain uninterrupted state synchronization through a continuous checkpointing technology. While a failure is detected by the backup, it has to replace the role of the primary virtual machine by performing recovery operations.
A conventional Kemari is an open-source project enabling the virtualization fault-tolerance technology based on Kernel Virtual Machine (KVM), which supports continuous execution through the continuous checkpointing technology between two virtual machines on different physical hosts respectively.
The Kemari, by modifying a system architecture from QEMU-KVM Live Migration, constantly transmits a memory state and a device state of the primary virtual machine onto a backup virtual machine to achieve the state synchronization. The Kemari constantly synchronizes a state onto the backup virtual machine from a primary virtual machine, and when a failure occurs in the primary virtual machine, the backup virtual machine can resume the original operations on the primary virtual machine after detecting the failure.
However, when the Kemari synchronizes the primary virtual machine and the backup virtual machine with a continuous checkpointing method that is triggered by every external event, execution on the primary virtual machine may often be stopped, thus seriously affecting the efficiency of the primary virtual machine.