1. Field of the Invention
This invention relates generally to the field of virtual machines, and, more specifically, to techniques for increasing the efficiency of the process of migrating a virtual machine from a source host to a destination host.
2. Description of the Related Art
A virtual machine is a virtual embodiment of a computer in which the computer resources are virtual rather than physical hardware. A virtual machine is typically implemented as software executing on a host computer. A virtual machine can often be hosted by a variety of host computers, with the details of the particular host computer remaining at least mostly transparent to the virtual machine.
As is well known in the field of computer science, a virtual machine (VM) is an abstraction—a “virtualization”—of an actual physical computer system. FIG. 1A shows one possible arrangement of a computer system 70 that implements virtualization. A virtual machine (VM) or “guest” 20 is installed on a “host platform,” or simply “host,” which will include system hardware, that is, a hardware platform 10, and one or more layers or co-resident components comprising system-level software, such as an operating system or similar kernel, or a virtual machine monitor or hypervisor (see below), or some combination of these. The system hardware typically includes one or more processors 11, memory 13, some form of mass storage 14, and various other devices 17.
Each VM 20, . . . , 20-n will typically have both virtual system hardware 28 and guest system software 29. The virtual system hardware typically includes at least one virtual CPU, virtual memory 23, at least one virtual disk 24, and one or more virtual devices 27. Note that a disk—virtual or physical—is also a “device,” but is usually considered separately because of the important role of the disk. All of the virtual hardware components of the VM may be implemented in software using known techniques to emulate the corresponding physical components. The guest system software includes a guest operating system (OS) 22 and drivers 25 as needed for the various virtual devices 27.
Note that a single VM may be configured with more than one virtualized processor. To permit computer systems to scale to larger numbers of concurrent threads, systems with multiple CPUs have been developed. These symmetric multi-processor (SMP) systems are available as extensions of the PC platform and from other vendors. Essentially, an SMP system is a hardware platform that connects multiple processors to a shared main memory and shared I/O devices. Virtual machines may also be configured as SMP VMs. FIG. 1A, for example, illustrates multiple virtual processors 21-0, 21-1, . . . , 21-m (VCPU0, VCPU1, VCPUm) within the VM 20.
Yet another configuration is found in a so-called “multi-core” architecture, in which more than one physical CPU is fabricated on a single chip, with its own set of functional units (such as a floating-point unit and an arithmetic/logic unit ALU), and can execute threads independently; multi-core processors typically share only very limited resources, such as some cache. Still another technique that provides for simultaneous execution of multiple threads is referred to as “simultaneous multi-threading,” in which more than one logical CPU (hardware thread) operates simultaneously on a single chip, but in which the logical CPUs flexibly share some resource such as caches, buffers, functional units, etc. This invention may be used regardless of the type—physical and/or logical—or number of processors included in a VM.
If the VM 20 is properly designed, applications 26 running on the VM will function as they would if run on a “real” computer, even though the applications are running at least partially indirectly, that is via the guest OS 22 and virtual processor(s). Executable files will be accessed by the guest OS from the virtual disk 24 or virtual memory 23, which will be portions of the actual physical disk 14 or memory 13 allocated to that VM. Once an application is installed within the VM, the guest OS retrieves files from the virtual disk just as if the files had been pre-stored as the result of a conventional installation of the application. The design and operation of virtual machines are well known in the field of computer science.
Some interface is generally required between the guest software within a VM and the various hardware components and devices in the underlying hardware platform. This interface—which may be referred to generally as “virtualization software”—may include one or more software components and/or layers, possibly including one or more of the software components known in the field of virtual machine technology as “virtual machine monitors” (VMMs), “hypervisors,” or virtualization “kernels.” Because virtualization terminology has evolved over time and has not yet become fully standardized, these terms do not always provide clear distinctions between the software layers and components to which they refer. For example, “hypervisor” is often used to describe both a VMM and a kernel together, either as separate but cooperating components or with one or more VMMs incorporated wholly or partially into the kernel itself; however, “hypervisor” is sometimes used instead to mean some variant of a VMM alone, which interfaces with some other software layer(s) or component(s) to support the virtualization. Moreover, in some systems, some virtualization code is included in at least one “superior” VM to facilitate the operations of other VMs. Furthermore, specific software support for VMs may be included in the host OS itself; moreover, there may also be specific support for virtualization in the system hardware. Unless otherwise indicated, the invention described below may be used in virtualized computer systems having any type or configuration of virtualization software.
Moreover, FIG. 1A shows virtual machine monitors 30, . . . , 30-n that appear as separate entities from other components of the virtualization software. Furthermore, some software components used to implemented one illustrated embodiment of the invention are shown and described as being within a “virtualization layer” located logically between all virtual machines and the underlying hardware platform and/or system-level host software. This virtualization layer can be considered part of the overall virtualization software, although it would be possible to implement at least part of this layer in specialized hardware. Unless otherwise indicated, the invention described below may be used in virtualized computer systems having any type or configuration of virtualization software. Moreover, the invention is described and illustrated below primarily as including one or more virtual machine monitors that appear as separate entities from other components of the virtualization software and perform certain functions relating to the invention. This is only for the sake of simplicity and clarity and by way of illustration—as mentioned above, the distinctions are not always so clear-cut, and the use of the term virtual machine monitor or just VMM is meant to encompass whichever component(s) in the virtualization software that perform the indicated functions, regardless of what name they are given. Again, unless otherwise indicated or apparent from the description, it is to be assumed that the invention can be implemented anywhere within the overall structure of the virtualization software, and even in systems that provide specific hardware support for virtualization.
The various virtualized hardware components in the VM, such as the virtual CPU(s), the virtual memory 23, the virtual disk 24, and the virtual device(s) 27, are shown as being part of the VM 20 for the sake of conceptual simplicity. In actuality, these “components” are usually implemented as software emulations 33 included in the VMM. One advantage of such an arrangement is that the VMM may (but need not) be set up to expose “generic” devices, which facilitate VM migration and hardware platform-independence.
Different systems may implement virtualization to different degrees—“virtualization” generally relates to a spectrum of definitions rather than to a bright line, and often reflects a design choice with respect to a trade-off between speed and efficiency on the one hand and isolation and universality on the other hand. For example, “full virtualization” is sometimes used to denote a system in which no software components of any form are included in the guest other than those that would be found in a non-virtualized computer; thus, the guest OS could be an off-the-shelf, commercially available OS with no components included specifically to support use in a virtualized environment.
In contrast, another concept, which has yet to achieve a universally accepted definition, is that of “para-virtualization.” As the name implies, a “para-virtualized” system is not “fully” virtualized, but rather the guest is configured in some way to provide certain features that facilitate virtualization. For example, the guest in some para-virtualized systems is designed to avoid hard-to-virtualize operations and configurations, such as by avoiding certain privileged instructions, certain memory address ranges, etc. As another example, many para-virtualized systems include an interface within the guest that enables explicit calls to other components of the virtualization software.
For some, para-virtualization implies that the guest OS (in particular, its kernel) is specifically designed to support such an interface. According to this view, having, for example, an off-the-shelf version of Microsoft Windows XP as the guest OS would not be consistent with the notion of para-virtualization. Others define para-virtualization more broadly to include any guest OS with any code that is specifically intended to provide information directly to any other component of the virtualization software. According to this view, loading a module such as a driver designed to communicate with other virtualization components renders the system para-virtualized, even if the guest OS as such is an off-the-shelf, commercially available OS not specifically designed to support a virtualized computer system. Unless otherwise indicated or apparent, this invention is not restricted to use in systems with any particular “degree” of virtualization and is not to be limited to any particular notion of full or partial (“para-”) virtualization.
In addition to the sometimes fuzzy distinction between full and partial (para-) virtualization, two arrangements of intermediate system-level software layer(s) are in general use—a “hosted” configuration and a non-hosted configuration (which is shown in FIG. 1A). In a hosted virtualized computer system, an existing, general-purpose operating system forms a “host” OS that is used to perform certain input/output (I/O) operations, alongside and sometimes at the request of the VMM. The Workstation product of VMware, Inc., of Palo Alto, Calif., is an example of a hosted, virtualized computer system, which is also explained in U.S. Pat. No. 6,496,847 (Bugnion, et al., entitled “System and Method for Virtualizing Computer Systems”).
As illustrated in FIG. 1A, in many cases, it may be beneficial to deploy VMMs on top of a software layer—a kernel 60—constructed specifically to provide efficient support for the VMs. This configuration is frequently referred to as being “non-hosted.” Compared with a system in which VMMs run directly on the hardware platform, use of a kernel offers greater modularity and facilitates provision of services (for example, resource management) that extend across multiple virtual machines. Compared with a hosted deployment, a kernel may offer greater performance because it can be co-developed with the VMM and be optimized for the characteristics of a workload consisting primarily of VMs/VMMs. The kernel 60 also handles any other applications running on it that can be separately scheduled, as well as an optional console operating system (COS 42) that, in some architectures, is used to boot the system and facilitate certain user interactions, via user-level applications 43 with the virtualization software.
Note that the kernel 60 is not the same as the kernel that will be within the guest OS 22—as is well known, every operating system has its own kernel. Note also that the kernel 60 is part of the “host” platform of the VM/VMM as defined above even though the configuration shown in FIG. 1A is commonly termed “non-hosted;” moreover, the kernel may be both part of the host and part of the virtualization software or “hypervisor.” The difference in terminology is one of perspective and definitions that are still evolving in the art of virtualization.
The address space of the memory 13 is partitioned into pages, regions, or other analogous allocation units. Applications address the memory using virtual addresses (VAs), each of which typically comprises a virtual page number (VPN) and an offset into the indicated page. The VAs are then mapped to physical addresses (PAs), each of which similarly comprises a physical page number (PPN) and an offset, and which is actually used to address the physical memory 13. The same offset is usually used in both a VA and its corresponding PA, so that only the VPN needs to be converted into a corresponding PPN.
The concepts of VPNs and PPNs, as well as the way in which the different page numbering schemes are implemented and used, are described in many standard texts, such as “Computer Organization and Design: The Hardware/Software Interface,” by David A. Patterson and John L. Hennessy, Morgan Kaufmann Publishers, Inc., San Francisco, Calif., 1994, pp. 579-603 (chapter 7.4 “Virtual Memory”). Similar mappings are used in region-based architectures or, indeed, in any architecture where relocatability is possible.
An extra level of addressing indirection is typically implemented in virtualized systems in that a VPN issued by an application running in a VM is remapped twice in order to determine which page of the hardware memory is intended. The first mapping is provided by the guest OS, which translates the guest VPN (GVPN) into a corresponding guest PPN (GPPN) in the conventional manner. The guest OS therefore “believes” that it is directly addressing the actual hardware memory, but in fact it is not.
Of course, a valid address to the actual hardware memory must ultimately be generated. A memory management module, located typically somewhere in the virtualization software (such as in the VMM), therefore performs the second mapping by taking the GPPN issued by the guest OS and mapping it to a hardware (or “machine”) page number PPN that can be used to address the hardware memory 13. This GPPN-to-PPN mapping may instead be done in the main system-level software layer, depending on the implementation. From the perspective of the guest OS 22, the GVPN and GPPN might be virtual and physical page numbers just as they would be if the guest OS were the only OS in the system. From the perspective of the system software, however, the GPPN is a page number that is then mapped into the physical memory space of the hardware memory as a PPN.
The addressable space of the disk(s), and therefore also of the virtual disk(s), is similarly subdivided into separately identifiable portions such as blocks or sectors, tracks, cylinders, etc. In general, applications do not directly address the disk; rather, disk access and organization are tasks reserved to the operating system, which follows some predefined file system structure. When the guest OS wants to write data to the (virtual) disk, the identifier used for the intended block, etc., is therefore also converted into an identifier into the address space of the physical disk. Conversion may be done within whatever system-level software layer that handles memory, disk and/or file system management for the VM and other processes.
From time to time, it may become necessary or desirable to migrate a virtual machine from one host to another. Consider, for example, the situation illustrated in FIG. 1B, where a server 100, embodied as a virtual machine 102 executing on a host 104, is subject to a surge in requests 107 over a network 108 from clients 106a, 106b, 106c, giving rise to contention for access to the physical resources of host 104, such as CPU time and memory space. To alleviate these contention problems, and provide load-balancing across physical hosts, it may be desirable or necessary to replicate the virtual machine 102, by migrating it to a second host 110 over the network 108, and, if necessary, transparently redirecting at least a portion 114 of the client requests to the second host 110 for handling. In FIG. 1B, the migrated copy of the virtual machine 102 as hosted by the second host 110 is identified with numeral 102′.
Another example is the case where, to perform preventative maintenance, it is necessary to take down a physical host to upgrade the hardware and/or software. To accomplish this, it would be useful to migrate all the virtual machines executing on the host to another host during this maintenance time. A third example is where an end user desires to seamlessly carry out work assignments on both work and home computers. To facilitate this, it may be useful to implement both computers as a virtual machine, with the work computer acting as one host and the home computer acting as another. The virtual machine may then be migrated from one host to another to track the mobility patterns of the end user. For example, during a work day, the virtual machine could be hosted by the work computer, and then, at the end of the work day, migrated to the home computer, allowing the end user to seamlessly resume his work assignments upon his arrival at home.
A virtual machine has state, which must be transferred in order to achieve a successful virtual machine migration. The state of a virtual machine includes its guest physical pages.
A problem thus arises because the collection of guest physical pages of a virtual machine typically ranges in size from hundreds of megabytes to several gigabytes, which can take hours to transfer across the Internet between distinct geographic locations at current network transmission rates. Since that is far too long for many applications, efforts have focused on reducing the amount of data that needs to be transmitted to achieve efficient virtual machine migration. For example, in Constantine P. Sapuntzakis et al., “Optimizing the Migration of Virtual Computers,” Proceedings of the Fifth Symposium on Operating System Design and Implementation, December 2002, the authors advocate four optimization techniques for reducing the amount of data that must be transferred in the course of performing virtual machine migration: copy-on-write, demand paging, ballooning, and hashing.
In copy-on-write, a copy-on-write disk is used to capture updates to the state of the virtual machine since the last transfer. The amount of data needed to achieve virtual machine migration is reduced because only data representing updates to the virtual machine state since the last transfer must be transferred to achieve virtual machine migration.
In demand paging, only those guest physical pages that are actually demanded by active processes in a virtual machine, and are therefore resident in the source host's cache or primary memory, are transferred. These pages, which can be referred to as the resident portion of the virtual machine, are easily identified as they have been paged or faulted into the host's cache or primary memory as the processes execute, while pages that have not been demanded remain in a secondary host location, typically disk storage. The remaining pages resident in the secondary host are not transferred, it being assumed that the secondary host storage for the source host will be accessible to the destination host.
In ballooning, the host operating system is tricked into reclaiming physical host memory from existing processes by running a balloon program that requests a large number of physical pages. Once reclaimed, the pages are then zeroed out, and then compressed, thus reducing the amount of data that needs to be transferred. Alternately, the reclaimed pages are swapped out to secondary storage, and only the pages remaining in the primary host memory are transferred. Again, the data needing to be transferred is reduced, albeit through a different mechanism.
Finally, in hashing, the sender, rather than sending pages, sends hash identifiers, determined by applying a predetermined hash function to the page contents. If the receiver locates a page on local storage that has the same identifier, it copies or utilizes the page from the local storage. Otherwise, it requests the page from the sender. The number of pages needing to be transferred is reduced by the number of pages that are determined, through matching of hash identifiers, to already be present at the destination.
The common thread underlying all of these techniques is a focus on reducing the amount of data that needs to be transferred. However, none address the equally or more significant problem of avoiding or reducing the pressure for memory at the destination host arising from the transfer. For example, regardless of whether the data is compressed or otherwise reduced in some manner, with each of these techniques, the entire resident portion of the virtual machine will still have to be accommodated in a cache or primary host memory at the destination host. If this memory is unavailable, the transfer will induce a lot of memory swapping or reclaiming at the destination host, thereby degrading system performance, no matter how much the data is compressed or reduced for purposes of the transfer.
In addition to this, one or more of these techniques has certain failings, limitations or restrictions so far as compression or data reduction is concerned. For example, copy-on-write requires that updates to the virtual machine state since a prior transfer be identified and distinguished from the remainder of the virtual machine state.
Ballooning assumes that pages in the cache or primary memory are available for reclaiming. If such is not the case, then ballooning will be ineffective for purposes of data compression or reduction.
Finally, hashing assumes that hash collisions, i.e., the situation where distinct pages with different contents map to the same page identifier, can be tolerated. However, the consequences of hash collisions are often severe, not to say catastrophic, as corrupting the contents of a page can cause a virtual machine to crash, lose data, or otherwise fail. Therefore, hashing, unaccompanied by a mechanism to avoid hash collisions, usually involves unacceptable risk. One way to reduce the risk, that is, the probability of a collision, is to use larger hashes, but this solution consumes more space, is more costly to compute, and still requires some additional mechanism to completely eliminate the risk.
In light of the foregoing, there is a need for a method of more efficiently migrating a virtual machine from a source host to a destination host by overcoming one or more problems of the prior art.