The use of virtualization is becoming widespread. Virtualization describes a software abstraction that separates a computer resource and its use from an underlying physical device. Generally, a virtual machine (VM) provides a software execution environment and may have a virtual processor, virtual system memory, virtual storage, and various virtual devices. Virtual machines have the ability to accomplish tasks independently of particular hardware implementations or configurations.
Virtualization permits multiplexing of an underlying host computer between different virtual machines. The host computer allocates a certain amount of its resources to each of the virtual machines. Each virtual machine is then able to use the allocated resources to execute applications, including operating systems (referred to as guest operating systems (OS)). The software layer providing the virtualization is commonly referred to as a hypervisor and is also known as a virtual machine monitor (VMM), a kernel-based hypervisor, or a host operating system. The hypervisor emulates the underlying hardware of the host computer, making the use of the virtual machine transparent to the guest operating system and the user of the computer.
Virtual machines may be migrated between a source host computing platform (“the source host”) and a destination host computing platform (“the destination host”) connected over a network, which may be a local-area network or a wide area-network that may include the Internet. Migration permits a clean separation between hardware and software, thereby improving facilities fault management, load balancing, and low-level system maintenance.
A brute force method of migrating virtual machines between a source host and a destination host over a network is to suspend the source virtual machine, copy its state to the destination host, boot the copied virtual machine on the destination host, and remove the source virtual machine. This approach has been shown to be impractical because of the large amount of down time users may experience. A more desirable approach is to permit a running source virtual machine to continue to run during the migration process, a technique known as live migration. Live migration permits an administrator to move a running virtual machine between different physical machines without disconnecting a running client or application program. For a successful live migration, memory, storage, and network connectivity of the virtual machine needs to be migrated from the source host to the destination host.
Related art methods of performing live migration of virtual machines between hosts generally include a pre-copy memory migration stage having a warm-up phase and a stop-and-copy-phase followed by a post-copy memory migration stage. In the pre-copy warm-up phase, a hypervisor copies all of the memory pages associated with the source virtual machine on the source host to the destination host while the source virtual machine is still running on the source host. If some memory pages change during the memory copy process, known as dirty pages, the dirty pages may be re-copied until the rate of re-copied pages is more than or equal to the page dirtying rate.
During the stop-and-copy phase, the source virtual machine is stopped, the remaining dirty pages are copied to the destination host, and the virtual machine is resumed on the destination host. The time between stopping the virtual machine on the source host and resuming the virtual machine on the destination host is known as “down-time”. Unfortunately, a down-time of a live migration employing conventional techniques may be as long as seconds and is approximately proportional to the size of memory and applications running on the source virtual machine.
In the post-copy memory migration stage, the source virtual machine is suspended at the source host. When the source virtual machine is suspended, a minimal execution state of the source virtual machine (CPU, registers, and non-pageable memory) is transferred to the destination host. The destination virtual machine is then resumed at the destination host, even though the entire memory state of the source virtual machine has not yet been transferred, and still resides at the source host. At the destination host, when the destination virtual machine tries to access pages that have not yet been transferred, it generates page-faults, which are trapped at the destination host and redirected towards the source host over the network. Such faults are referred to as network faults. The source host responds to the network-fault by sending the faulted page. Since each page fault of the running destination virtual machine is redirected towards the source host, it can degrade the applications running inside the destination virtual machine.
Copying pages over a network is inherently unreliable. If a destination host or the network between the source host and the destination host encounters a problem, migration may fail. In such circumstances, it may be necessary to remove the portion of the virtual machine at the destination host and start again with a new destination host.