The advantages of fault-tolerant computing have become widely recognized. Among these advantages is an ability to maintain duplicate sets of data and resources in the event of a system crash or corruption, thereby preventing an entire system from being lost due to failure of one or more components. Such systems are common in medical, navigational, military and real-time processing systems. However, the implementation of fault tolerant systems in a virtual machine environment creates special challenges. In order to more fully appreciate these challenges, a discussion of virtual machine technology is appropriate.
Virtual machine technology provides an ability to run multiple virtual machines on a single host platform. This makes better use of the capacity of the hardware, while still ensuring that each user enjoys the features of a “complete,” isolated computer. As is well known in the field of computer science, a virtual machine (VM) is a software abstraction—a “virtualization”—of an actual physical computer system. FIG. 1 illustrates, in part, a general configuration of virtual machine 200 which is installed as a “guest” on “host” hardware platform 100. As FIG. 1 shows, hardware platform 100 includes one or more processors (CPU's) 110, system memory 130, and a storage device, which will typically be a disk (disk 140). The system memory will typically be some form of high-speed RAM, whereas the disk (one or more) will typically be a non-volatile, mass storage device. Hardware 100 will also typically include other conventional mechanisms such as memory management unit (MMU) 150, various registers 160, and any conventional network connection device 170 (such as a network adapter or network interface card—“NIC”) for transfer of data between.
Each VM 200 will typically include at least one virtual CPU 210, virtual disk 240, virtual system memory 230, guest operating system 220 (which may simply be a copy of a conventional operating system), and various virtual devices 235, for which the guest operating system (“guest OS”) will include corresponding drivers 224. All of the components of the VM may be implemented in software using known techniques to emulate the corresponding components of an actual computer.
Typically, it will not be apparent to a user that any applications 260 running within the VM are running indirectly, that is, via the guest OS and virtual processor. Applications 260 running within the VM will act just as they would if run on a “real” computer, except for a decrease in running speed that may be noticeable only in exceptionally time-critical applications. Executable files will be accessed by the guest OS from the virtual disk or virtual memory, which will simply be portions of the actual physical disk or memory allocated to that VM. Once an application is installed within the VM, the guest OS retrieves files from the virtual disk just as if they had been pre-stored as the result of a conventional installation of the application. The design and operation of virtual machines is well known in the field of computer science.
Some interface is usually required between a VM and the underlying host platform (in particular, the CPU) which is responsible for actually executing VM-issued instructions and transferring data to and from the actual memory and storage devices. A common term for this interface is a “virtual machine monitor” (VMM), shown as component 300. A VMM is usually a thin piece of software that runs directly on top of a host, or directly on the hardware, and virtualizes all the resources of the machine. Among other components, the VMM usually includes device emulators 330 which may constitute the virtual devices (235) that VM 200 addresses. The interface exported to the VM is such that the guest OS cannot determine the presence of the VMM. The VMM also usually tracks and either forwards (to some form of operating system) or itself schedules and handles all requests by its VM for machine resources, as well as various faults and interrupts.
Although the VM (and thus the user of applications running in the VM) cannot usually detect the presence of the VMM, the VMM and the VM may be viewed as together forming a single virtual computer. They are shown in FIG. 1 as separate components for the sake of clarity.
In some systems, such as a Workstation product of VMware, Inc., of Palo Alto, Calif., the VMM is co-resident at system level with a host operating system. Both the VMM and the host OS can independently modify the state of the host processor, but the VMM calls into the host OS via a driver and a dedicated user-level application to have the host OS perform certain I/O operations of behalf of the VM. The virtual computer in this configuration is fully hosted in that it runs on an existing host hardware platform and together with an existing host OS. In other implementations, a dedicated kernel takes the place of and performs the conventional functions of the host OS, and virtual computers run on the kernel. FIG. 1 illustrates kernel 600 that serves as the system software for several VM/VMM pairs 200/300, . . . , 200n/300n. Compared with a system in which VMMs run directly on the hardware platform, use of a kernel offers greater modularity and facilitates provision of services that extend across multiple VMs (for example, for resource management). Compared with the hosted deployment, a kernel may offer greater performance because it can be co-developed with the VMM and be optimized for the characteristics of a workload consisting of VMMs. An ESX Server product of VMware, Inc. has such a configuration. A kernel-based virtualization system of the type illustrated in FIG. 1 is described in U.S. patent application Ser. No. 09/877,378 (“Computer Configuration for Resource Management in Systems Including a Virtual Machine”), now U.S. Pat. No. 6,961,941, which is incorporated here by reference. The main components of this system and aspects of their interaction are, however, outlined below.
Except for network 700, the entire multi-VM system shown in FIG. 1 can be implemented in a single physical machine, such as a server. This is illustrated by single functional boundary 1000. Of course devices such as keyboards, monitors, etc., will also be included to allow users to access and use the system, possibly via network 700; these are not shown merely for the sake of simplicity.
For purposes of understanding the above-described virtual machine technology, the following should be borne in mind. First, each VM 200, . . . , 200n has its own state and is an entity that can operate completely independently of other VMs. Second, the user of a VM, in particular, of an application running on the VM, will usually not be able to notice that the application is running on a VM (which is implemented wholly as software) as opposed to a “real” computer. Third, assuming that different VMs have the same configuration and state, the user will not know and would have no reason to care which VM he/she is currently using. Fourth, the entire state (including memory) of any VM is available to its respective VMM, and the entire state of any VM and of any VMM is available to kernel 600. Finally, as a consequence of the above, a VM is “relocatable.”
Co-pending U.S. patent application Ser. No. 09/497,978, filed 4 Feb. 2000 (“Encapsulated Computer System”), now U.S. Pat. No. 6,795,966, which is incorporated herein by reference, discloses a mechanism for checkpointing an entire state of a VM. When a VM is suspended, all of its state (including its memory) is written to a file on disk. A VM can then be migrated by suspending the VM on one server and resuming it, for example, via shared storage on another server.
Note that the execution of a VM is frequently suspended even though it is “running.” A VM may be suspended, for example, to allow execution of another co-running VM to proceed. Suspending the VM long enough to transfer its non-memory state is therefore not inconsistent with the notion that it is still running. Suspension for the purpose of non-memory state transfer contrasts however, with powering down or “shutting off” the VM, which is a software mechanism that virtualizes the power-down procedure of a physical machine. For example, suspension does not necessarily lead to loss of cached data, whereas powering-off typically does. Similarly, resumption of execution after a suspension does not require such time-consuming tasks as rebooting the OS, whereas powering back on (“restarting”) typically does.
As an improvement to the suspend and resume technique cited above, U.S. patent application Ser. No. 10/319,217, entitled “Virtual Machine Migration”, which is commonly assigned, and which is hereby incorporated herein by this reference, describes methods that may be used to allow a running VM to be moved between physical hosts. With the system and techniques described therein, a VM to be moved is allowed to keep running until most of its physical memory has been copied to the destination host. Once the VM's memory is copied, it is paused while the rest of its state is saved and sent to the destination host. Once the destination host has received all the VM's state, the VM is resumed on the destination host and terminated on the source host. A product which embodies the functionality described in the above-identified patent application is VMware's VMotion commercially available from VMware, Inc., Palo Alto, Calif. 94304, and is included in VMware Infrastructure Enterprise Edition or can be purchased as an add-on product to the Standard and Starter editions.