This application is generally directed to virtual machines and, more particularly, to managing checkpoint-based high-availability of backup virtual machines in the event of a failure of a primary virtual machine.
Computing is typically thought of in terms of an application and a supporting platform. A supporting platform typically includes a hardware infrastructure of one or more processor cores, input/output, memory, and fixed storage (the combination of which supports an operating system (OS), which in turn supports one or more applications). Applications are typically self-contained bundles of logic relying on little other than core object files and related resource files. As computing has become integral to modern industry, applications have become co-dependent on the presence of other applications. That is, the requisite environment for an application includes not only an underlying OS and supporting hardware platform, but also other key applications. Key applications may include application servers, database management servers, collaboration servers, and communicative logic commonly referred to as middleware.
Given the complexity of application and platform interoperability, different combinations of applications executing in a single hardware platform can demonstrate differing degrees of performance and stability. Virtualization technology aims to interject a layer between a supporting platform and executing applications. From the perspective of business continuity and disaster recovery, virtualization provides the inherent advantage of environment portability. Specifically, to move an entire environment configured with multiple different applications is a matter of moving a virtual image from one supporting hardware platform to another. Further, more powerful computing environments can support the coexistence of multiple different virtual images, all the while maintaining a virtual separation between the images. Consequently, a failure condition in one virtual image typically cannot jeopardize the integrity of other co-executing virtual images in the same hardware platform.
A virtual machine monitor (VMM) or hypervisor manages the interaction between each virtual image and underlying resources provided by a hardware platform. In this regard, a bare metal hypervisor runs directly on the hardware platform, much as an OS runs directly on hardware. By comparison, a hosted hypervisor runs within a host OS. In either case, a hypervisor can support the operation of different guest OS images, known as virtual machine (VM) images. The number of VM images is limited only by the processing resources of a VM container that holds the VM images or the hardware platform. Virtualization has proven especially useful for end-users that require separate computing environments for different types of applications, while being limited to a single hardware platform.
For example, it is well known for a primary OS native to one type of hardware platform to provide a virtualized guest OS native to a different hardware platform (so that applications requiring the presence of the guest OS can co-exist with other applications requiring the presence of the primary OS). In this way, the end-user need not provide separate computing environments to support different types of applications. Regardless of the guest OS, access to underlying resources of the single hardware platform remains static. Virtualized environments have been deployed to aggregate different interdependent applications in different VMs in composing application solutions. For example, an application server can execute within one VM while a database management system executes in a different VM and a web server executes in yet another VM. Each of the VMs can be communicatively coupled to one another in a secure network and any given deployment of the applications can be live migrated to a different deployment without interfering with the execution of the other applications in the other VMs.
In a typical live migration, a VM can be moved from one host server to another host server in order to, for example, permit server maintenance or to permit an improvement in hardware support for the VM. Checkpoint-based high-availability is a technique in which a VM running on a primary host machine mirrors its processor and memory state every period (e.g., 25 mS) onto a secondary host machine. The mirroring process involves: tracking changes to the memory and processor state of the primary VM; periodically stopping the primary VM; sending the changes over a network to the secondary host machine; waiting for the secondary host machine to acknowledge receipt of the memory and processor state update; and resuming the primary VM. The mirroring process ensures that the secondary host machine is able to resume the workload with no loss of service should the primary host machine suffer a sudden hardware failure.
If the secondary host machine either notices that the primary host machine is not responding or receives an explicit notification from the primary host machine, the secondary host machine starts the mirrored version of the VM and the appearance to the outside world is that the VM seamlessly continued to execute across the failure of the primary host machine. Although this technique provides effective protection against hardware failure, it does not protect against software failure. Because the state of the memory and processor of the primary VM is faithfully reproduced on the secondary host machine, if a software crash (for example, the de-reference of a null pointer) causes a failover to the secondary host machine, the VM would resume execution from the last checkpoint and, if the program execution is deterministic, the same error will occur.
There are some constrained cases in which a VM may not crash if software failure triggered a failover. However, these cases are few and far between, and rely more on luck than design. For example, a software bug that manifested as a race condition in which one processor could access data that was being modified by another processor might not occur when the workload was resumed on the secondary host machine, as by a fluke of scheduling the data may not end up being concurrently accessed. Implementing checkpoint availability with VMs is known. For example, a publication entitled “IMPLEMENTATION AND EVALUATION OF A SCALABLE APPLICATION-LEVEL CHECKPOINT-RECOVERY SCHEME FOR MPI PROGRAMS”, by Greg Bronevetsky et al., attempts to address the checkpoint availability problem that running times of many computer applications are much longer than the mean-time-to-failure of current high-performance computing platforms.