Virtual machine fault tolerance is a key technology in the virtualization field, and its purpose is to switch a service to a secondary virtual machine when a primary virtual machine is shut down because of a hardware fault in order to ensure key-service zero outage and zero data loss.
Currently, a fault tolerance method based on virtual machine status synchronization is usually used. In this fault tolerance method, a status of a primary virtual machine is synchronized at intervals to a secondary virtual machine in order to keep status synchronization between the primary virtual machine and the secondary virtual machine. In the fault tolerance method, externally-presented data consistency of a primary-end host and a secondary-end host needs to be ensured. Therefore, a network data output packet of the primary virtual machine needs to be buffered first. After status synchronization between the primary virtual machine and the secondary virtual machine is completed, the buffered network data output packet is released from the primary-end host. Therefore, higher synchronization frequency indicates better network performance. However, on one hand, time overheads for synchronizing statuses of the primary virtual machine and the secondary virtual machine define an upper limit of frequency of synchronizing a virtual machine status. On the other hand, the primary virtual machine needs to be frequently paused and resumed when the virtual machine status is frequently synchronized, and this leads to a substantial decline in computing performance of the primary virtual machine.
Currently, to resolve a conflict between improvement of network performance and improvement of computing performance in the fault tolerance method based on virtual machine status synchronization, INTEL Corporation has proposed a virtual machine fault tolerance method based on coarse-grain lock-stepping (COLO). In this method, by comparing network responses, to a same network request, from the primary virtual machine and the secondary virtual machine, whether to synchronize statuses of the primary virtual machine and the secondary virtual machine is determined. When response data packets of a primary network and a secondary network are the same, a network data output packet of the primary-end host is released immediately, or the statuses of the primary virtual machine and the secondary virtual machine are synchronized immediately when response data packets of a primary network and a secondary network are different. In this manner, the primary-end host can release the network data output packet in a timely manner, and obtain better network performance. In addition, when output of the primary network and output of the secondary network are the same, intervals for synchronizing the statuses of the primary virtual machine and the secondary virtual machine may be prolonged, and this reduces impact on computing performance of the virtual machine.
However, in the virtual machine fault tolerance method based on COLO, a network response data packet is used as a basic unit for comparison. To make network response data packets output by the primary virtual machine and the secondary virtual machine consistent as far as possible, network protocol stacks in kernels of the primary virtual machine and the secondary virtual machine need to be modified. In this way, network performance and computing performance of a virtual machine whose kernel cannot be modified cannot be improved.