1. Field of the Invention
This invention relates to the sharing of memory pages among virtual machines
2. Description of the Related Art
The advantages of virtual machine technology have become widely recognized. Among these advantages is the ability to run multiple virtual machines on a single host platform. This makes better use of the capacity of the hardware while still ensuring that each user enjoys the features of a complete computer. Depending on how it is implemented, virtualization also provides greater security, since the virtualization can isolate potentially unstable or unsafe software so that it cannot adversely affect the hardware state or system files required for running the physical (as opposed to virtual) hardware
As is well known in the field of computer science, a virtual machine (VM) is a software abstraction—a “virtualization”—of an actual physical computer system. FIG. 1 shows one possible arrangement of a computer system 700 that implements virtualization. A virtual machine (VM) 200, which in this system is a “guest,” is installed on a “host platform” or simply host which will include a system hardware 100, that is, a hardware platform, and one or more layers or co-resident components comprising system-level software, such as an operating system (OS) or similar kernel, a virtual machine monitor or hypervisor (see below), or some combination of these.
As software, the code defining the VM will ultimately execute on the actual system hardware 100. As in almost all computers, this hardware will include one or more CPUs 110, some form of memory 130 (volatile or non-volatile), one or more storage devices such as one or more disks 140, and one or more devices 170, which may be integral or separate and removable.
In many existing virtualized systems, the hardware processor(s) 110 are the same as in a non-virtualized computer with the same platform, for example, the Intel x86 platform. Because of the advantages of virtualization, some hardware vendors have proposed, and are presumably developing, hardware processors that include specific hardware support for virtualization.
Each VM 200 will typically mimic the general structure of a physical computer and as such will usually have both virtual system hardware 201 and guest system software 202. The virtual system hardware typically includes at least one virtual CPU 210, virtual memory 230, at least one virtual disk 240 and one or more virtual devices 270. Note that a storage disk—virtual 240 or physical 140—is also a device, but is usually considered separately because of the important role it plays. All of the virtual hardware components of the VM may be implemented in software to emulate corresponding physical components. The guest system software includes a guest operating system (OS) 220 and drivers 224 as needed, for example, for the various virtual devices 270.
A single VM may (but need not) be configured with more than one virtualized physical and/or logical processor. Each virtualized processor in a VM may also be multi-core, or multi-threaded, or both, depending on the virtualization. This invention may be used to advantage regardless of the number of processors the VMs are configured to have.
If the VM 200 is properly designed, applications 260 running on the VM will function essentially as they would if run on a “real” computer, even though the applications are running at least partially indirectly, that is via the guest OS 220 and virtual processor(s). Executable files will be accessed by the guest OS from the virtual disk 240 or virtual memory 230, which will be portions of the actual physical disk 140 or memory 130 allocated to that VM. Once an application is installed within the VM, the guest OS retrieves files from the virtual disk just as if the files had been pre-stored as the result of a conventional installation of the application. The design and operation of virtual machines in general are known in the field of computer science.
Some interface is generally required between the guest software within a VM and the various hardware components and devices in the underlying hardware platform. This interface—which may be referred to generally as “virtualization software”—may include one or more software components and/or layers, possibly including one or more of the software components known in the field of virtual machine technology as “virtual machine monitors” (VMMs), “hypervisors,” or virtualization “kernels.” Because virtualization terminology has evolved over time and has not yet become fully standardized, these terms do not always provide clear distinctions between the software layers and components to which they refer. For example, “hypervisor” is often used to describe both a VMM and a kernel together, either as separate but cooperating components or with one or more VMMs incorporated wholly or partially into the kernel itself; however, “hypervisor” is sometimes used instead to mean some variant of a VMM alone, which interfaces with some other software layer(s) or component(s)—sometimes including the host OS itself—to support the virtualization. Moreover, in some systems, some virtualization code is included in at least one “superior” VM to facilitate the operations of other VMs. Furthermore, specific software support for VMs is sometimes included in the host OS itself.
Unless otherwise indicated, the invention described below may be used in virtualized computer systems having any type or configuration of virtualization software. Moreover, the invention is described and illustrated below primarily as including one or more virtual machine monitors that appear as separate entities from other components of the virtualization software. This is only for the sake of simplicity and clarity and by way of illustration—as mentioned above, the distinctions are not always so clear-cut. Again, unless otherwise indicated or apparent from the description, it is to be assumed that the invention can be implemented anywhere within the overall structure of the virtualization software.
By way of illustration and example only, the figures show a VM (only one of which is shown, for simplicity) running on a corresponding virtual machine monitor. The description's reference to VMMs is also merely by way of common example. A VMM is usually a software component that virtualizes at least one hardware resource of some physical platform, so as to export a hardware interface to the VM corresponding to the hardware the VM “thinks” it is running on. A virtualized computer system may (and usually will) have more than one VM, each of which may be running on its own VMM.
The various virtualized hardware components in the VM, such as the virtual CPU(s) 210, etc., the virtual memory 230, the virtual disk 240, and the virtual device(s) 270, are shown as being part of the VM 200 for the sake of conceptual simplicity. In actuality, these “components” are often implemented as software emulations included in some part of the virtualization software, such as the VMM. One advantage of such an arrangement is that the virtualization software may (but need not) be set up to expose “generic” devices, which facilitate, for example, migration of a VM from one hardware platform to another.
Different systems may implement virtualization to different degrees—“virtualization” generally relates to a spectrum of definitions rather than to a bright line, and often reflects a design choice in respect to a trade-off between speed and efficiency on the one hand and isolation and universality on the other hand. For example, “full virtualization” is sometimes used to denote a system in which no software components of any form are included in the guest other than those that would be found in a non-virtualized computer; thus, the guest OS could be an off-the-shelf, commercially available OS with no components included specifically to support use in a virtualized environment.
Another concept that has yet to achieve a universally accepted definition, is that of “para-virtualization.” As the name implies, a “para-virtualized” system is not “fully” virtualized, but rather the guest is configured in some way to provide certain features that facilitate virtualization. For example, the guest in some para-virtualized systems is designed to avoid hard-to-virtualize operations and configurations, such as by avoiding certain privileged instructions, certain memory address ranges, etc. As another example, many para-virtualized systems include an interface within the guest that enables explicit calls to other components of the virtualization software. For some, para-virtualization implies that the guest OS (in particular, its kernel) is specifically designed to support such an interface. According to this view, having, for example, an off-the-shelf version of Microsoft Windows XP as the guest OS would not be consistent with the notion of para-virtualization. Others define para-virtualization more broadly to include any guest OS with any code that is specifically intended to provide information directly to the other virtualization software. According to this view, loading a module such as a driver designed to communicate with other virtualization components renders the system para-virtualized, even if the guest OS as such is an off-the-shelf, commercially available OS not specifically designed to support a virtualized computer system. Unless otherwise indicated or apparent, this invention is not restricted to use in systems with any particular “degree” of virtualization and is not to be limited to any particular notion of full or partial (“para-”) virtualization.
With regards to utilization of memory 130, conventionally, the address space of the memory 130 is partitioned into pages, regions, or other analogous allocation units. With non-virtualized systems, a single level of addressing indirection is involved. For example, applications address the physical memory 130 using virtual addresses (VAs), each of which typically comprises a virtual page number (VPN) and an offset into the indicated page. The VAs are then mapped to physical addresses (PAs), each of which similarly comprises a physical page number (PPN) and an offset, and which is actually used to address the physical memory 130. The same offset is usually used in both a virtual address and its corresponding physical address, so that only the VPN needs to be converted into a corresponding PPN.
The concepts of VPNs and PPNs, as well as the way in which the different page numbering schemes are implemented and used, are described in many standard texts, such as “Computer Organization and Design: The Hardware/Software Interface,” by David A. Patterson and John L. Hennessy, Morgan Kaufmann Publishers, Inc., San Francisco, Calif., 1994, pp. 579-603 (chapter 7.4 “Virtual Memory”). Similar mappings are used in region-based architectures or, indeed, in any architecture where relocatability is possible.
In architectures that provide access control bits, these bits are typically associated with virtual pages in translation lookaside buffer (TLB) entries. The hardware MMU enforces the access control bits during VPN→PPN translation in the system's translation lookaside buffer (TLB).
In contrast to non-virtualized systems, virtualized systems, such as virtual machine 200 in FIG. 1, require an extra level of addressing indirection. For virtual machine 200 a virtual page number (VPN) is issued by an application (e.g., APPS 260) running in the VM 200. The VPN is remapped twice in order to determine which page of the hardware memory is intended. The first mapping is provided by the guest OS 220, which translates the guest VPN (GVPN) into a corresponding guest PPN (GPPN) in the conventional manner. The guest OS 220 therefore “believes” that it is directly addressing the actual hardware memory 130, but in fact it is not.
Of course, a valid address to the actual hardware memory must ultimately be generated. A memory management module, located typically in the VMM 300, therefore performs the second mapping by taking the GPPN issued by the guest OS 220 and mapping it to a hardware (or “machine”) page number PPN that can be used to address the hardware memory 130. This GPPN-to-PPN mapping may instead be done in the main system-level software layer, depending on the implementation. From the perspective of the guest OS 220, the GVPN and GPPN might be virtual and physical page numbers just as they would be if the guest OS were the only OS in the system. From the perspective of the system software, however, the GPPN is a page number that is then mapped into the physical memory space of the hardware memory 130 as a PPN.
The addressable space of the disk(s), and therefore also of the virtual disk(s), is similarly subdivided into separately identifiable portions such as blocks or sectors, tracks, cylinders, etc. In general, applications do not directly address the disk; rather, disk access and organization are tasks reserved to the operating system, which follows some predefined file system structure. When the guest OS 220 wants to write data to the (virtual) disk, the identifier used for the intended block, etc., is therefore also converted into an identifier into the address space of the physical disk. Conversion may be done within whatever system-level software layer that handles the VM.
Furthermore, as is well known, in most modern computer architectures, system memory is typically divided into individually addressable units or blocks commonly known as “pages,” each of which in turn contains many separately addressable data words, which in turn will usually comprise several bytes. A page is usually (but not necessarily) the minimum amount of memory that the operating system allocates or loads at a time. This invention does not presuppose any particular page size, or even that the page size must be constant. Pages are identified by addresses commonly referred to as “page numbers;” other architectures have identifiers that are analogous to page numbers. Without loss of generality, it is therefore assumed by way of example below that the memory is arranged in pages, each of which is identified by a page number.
As illustrated in FIG. 1, and as described in U.S. Pat. No. 6,496,847, one way for the VMM to have the host OS perform certain tasks (such as I/O) on its behalf is for the VMM to call through a driver 425 (for example, one of the drivers 424) in the host OS to a user-level application VMX, which then can submit the task request to the host OS, possibly via whatever application program interface (API) the host OS interposes. In one embodiment, the driver 425 is installed in the host OS to specifically enable calls from the VMM. The VMM, instead of calling directly into an application that is running in the host OS context, calls through driver 425, up to the VMX, and back down to the host OS via its existing API. This allows the VMM to communicate with the host OS but remain at system level, and without requiring any modification to the host OS other than the installation of a suitable driver.
In some implementations, multiple virtual machines often have memory pages with identical content, particularly for program code and filesystem buffer cache data. For example, if multiple virtual machines are running the same guest OS, the same portions of the OS code may be resident in multiple physical memory pages at the same time for execution within different virtual machines. Thus, for a particular page worth of OS code, there may be multiple copies of the page in memory, with one copy of the page being associated with each of multiple VMs. To reduce memory overhead, a virtual machine monitor can reclaim such redundant memory pages, leaving only a single copy of the memory page to be shared by the multiple virtual machines.
Embodiments relate to mechanisms for sharing pages between virtual machines in which the virtual machine monitor lets the host manage I/O, memory allocation, and paging. In this environment, a virtual machine application may choose to represent the virtual machine's memory as a file in the host filesystem to allow the host to manage the associated memory. This design is simple, portable, and does not require a custom operating system to run the virtual machine monitor. For example, an I/O request initiated by the virtual machine would percolate through various layers of device emulation and finally translate to a read, write, or mmap system call on the host operating system.
The sharing of memory between virtual machines can persist as long as none of the virtual machines chooses to write to the shared memory pages. If a write occurs, the virtual machine must break sharing for the modified page and obtain a private copy of the shared memory. Otherwise, one virtual machine could be operating off of what, to it, would be invalid data because another virtual machine had written to the same shared memory. Furthermore, sharing memory between virtual machines when a file backs main memory is challenging. Typically, as the contents of the files are written or otherwise modified, the data is written back to a non-volatile, persistent main memory to preserve the data. With multiple shared memories, it becomes harder to ensure that the data retained in the main memory accurately tracks that of the files. The VMM therefore preferably lets the host device manage all hardware memory. When the VM wants guest GPPNs, it asks the host to supply them. Maintaining data coherency is especially complex when sharing pages and subsequently breaking shared pages is implemented.
The virtualization software can choose to represent guest memory as a file on the host. This allows the virtualization software to use the host OS to access the file (e.g. read( ), write( ), mmap( ) munmap( )). The host OS will have the flexibility to pageout unlocked in-memory guest pages to the file, reclaim the memory page for other uses, and retrieve the contents of the guest page later on by accessing the file when the virtualization software asks for the page. The virtualization software will have the flexibility of allowing the host OS to do these operations, and yet get back the exact contents of guest memory when it needs the contents by accessing the file (with mmap( ) or read( )/write( ) operations). The backing file enables the host OS to page guest memory in and out of physical memory, as needed or as appropriate. Thus, the contents of the guest memory do not have to be always stored in physical memory.
The virtual machine monitor requires knowledge of the hardware physical page number (PPN), associated with each guest physical page number (GPPN). Consider the scenario in which a shared hardware physical memory page is in use by several virtual machines and the host operating system supplies a hardware physical memory page associated with one of the backing files. If a virtual machine wants to write to that page, the virtual machine application must contact all other virtual machines sharing that physical memory page and wait for them to evict the physical memory page from their emulated TLBs. This requires co-operation between VMs, weakens isolation guarantees between virtual machines, and would likely be prohibitively expensive.