1. Field of the Invention
The invention relates generally to virtualized computer systems, and specifically to memory management for a virtual machine.
2. Description of Related Art
The advantages of virtual machine technology have become widely recognized. Among these advantages is the ability to run multiple virtual machines on a single host platform. This makes better use of the capacity of the hardware, while still ensuring that each user enjoys the features of a “complete” computer. Depending on how it is implemented, virtualization also provides greater security, since the virtualization can isolate potentially unstable or unsafe software so that it cannot adversely affect the hardware state or system files required for running the physical (as opposed to virtual) hardware.
As is well known in the field of computer science, a virtual machine (VM) is a software abstraction—a “virtualization”—of an actual physical computer system. FIG. 1 shows one possible arrangement of a computer system 700 that implements virtualization. A virtual machine (VM) 200, which in this system is a “guest,” is installed on a “host platform,” or simply “host,” which will include a system hardware 100, that is, a hardware platform, and one or more layers or co-resident components comprising system-level software, such as an operating system (OS) or similar kernel, a virtual machine monitor or hypervisor (see below), or some combination of these.
As software, the code defining the VM will ultimately execute on the actual system hardware 100. As in almost all computers, this hardware will include one or more CPUs 110, some form of memory 130 (volatile or non-volatile), one or more storage devices such as one or more disks 140, and one or more devices 170, which may be integral or separate and removable.
In many existing virtualized systems, the hardware processor(s) 110 are the same as in a non-virtualized computer with the same platform, for example, the Intel x86 platform. Because of the advantages of virtualization, however, some hardware vendors are producing hardware processors that include specific hardware support for virtualization.
Each VM 200 will typically mimic the general structure of a physical computer and as such will usually have both virtual system hardware 201 and guest system software 202. The virtual system hardware typically includes at least one virtual CPU 210, virtual memory 230, at least one virtual disk 240, and one or more virtual devices 270. Note that a storage disk—virtual 240 or physical 140—is also a “device,” but is usually considered separately because of the important role it plays. All of the virtual hardware components of the VM may be implemented in software to emulate corresponding physical components. The guest system software includes a guest operating system (OS) 220 and drivers 224 as needed, for example, for the various virtual devices 270.
To permit computer systems to scale to larger numbers of concurrent threads, systems with multiple CPUs—physical or logical, or a combination—have been developed. One example is a symmetric multi-processor (SMP) system, which is available as an extension of the PC platform and from other vendors. Essentially, an SMP system is a hardware platform that connects multiple processors to a shared main memory and shared I/O devices. Yet another configuration is found in a so-called “multi-core” architecture, in which more than one physical CPU is fabricated on a single chip, with its own set of functional units (such as a floating-point unit and an arithmetic/logic unit ALU), and can execute threads independently; multi-core processors typically share only very limited resources, such as at least some cache. Still another technique that provides for simultaneous execution of multiple threads is referred to as “simultaneous multi-threading,” in which more than one logical CPU (hardware thread) operates simultaneously on a single chip, but in which the logical CPUs flexibly share not only one or more caches, but also some functional unit(s) and sometimes also the translation lookaside buffer (TLB).
Similarly, a single VM may (but need not) be configured with more than one virtualized physical and/or logical processor. By way of example, FIG. 1 illustrates multiple virtual processors 210, 211, . . . , 21m (VCPU0, VCPU1, . . . , VCPUm) within the VM 200. Each virtualized processor in a VM may also be multi-core, or multi-threaded, or both, depending on the virtualization. This invention may be used to advantage regardless of the number of processors the VMs are configured to have.
If the VM 200 is properly designed, applications 260 running on the VM will function essentially as they would if run on a “real” computer, even though the applications are running at least partially indirectly, that is via the guest OS 220 and virtual processor(s). Executable files will be accessed by the guest OS from the virtual disk 240 or virtual memory 230, which will be portions of the actual physical disk 140 or memory 130 allocated to that VM. Once an application is installed within the VM, the guest OS retrieves files from the virtual disk just as if the files had been pre-stored as the result of a conventional installation of the application. The design and operation of virtual machines in general are known in the field of computer science.
Some interface is generally required between the guest software within a VM and the various hardware components and devices in the underlying hardware platform. This interface—which may be referred to generally as “virtualization software” or “virtualizatino layer”—may include one or more software components and/or layers, possibly including one or more of the software components known in the field of virtual machine technology as “virtual machine monitors” (VMMs), “hypervisors,” or virtualization “kernels.” Because virtualization terminology has evolved over time and has not yet become fully standardized, these terms do not always provide clear distinctions between the software layers and components to which they refer. For example, “hypervisor” is often used to describe both a VMM and a kernel together, either as separate but cooperating components or with one or more VMMs incorporated wholly or partially into the kernel itself; however, “hypervisor” is sometimes used instead to mean some variant of a VMM alone, which interfaces with some other software layer(s) or component(s) to support the virtualization. Moreover, in some systems, some virtualization code is included in at least one “superior” VM to facilitate the operations of other VMs. Furthermore, specific software support for VMs is sometimes included in the host OS itself.
Unless otherwise indicated, the invention described below may be used in virtualized computer systems having any type or configuration of virtualization software. Moreover, the invention is described and illustrated below primarily as including one or more virtual machine monitors that appear as separate entities from other components of the virtualization software. This is only for the sake of simplicity and clarity and by way of illustration—as mentioned above, the distinctions are not always so clear-cut. Again, unless otherwise indicated or apparent from the description, it is to be assumed that the invention can be implemented with components residing anywhere within the overall structure of the virtualization software.
By way of illustration and example only, the figures show each VM running on a corresponding virtual machine monitor. The description's reference to VMMs is also merely by way of common example. A VMM is usually a software component that virtualizes at least one hardware resource of some physical platform, so as to export a hardware interface to the VM corresponding to the hardware the VM “thinks” it is running on. As FIG. 1 illustrates, a virtualized computer system may (and usually will) have more than one VM, each of which may be running on its own VMM.
The various virtualized hardware components in the VM, such as the virtual CPU(s) 210, etc., the virtual memory 230, the virtual disk 240, and the virtual device(s) 270, are shown as being part of the VM 200 for the sake of conceptual simplicity. In actuality, these “components” are often implemented as software emulations included in some part of the virtualization software, such as the VMM. One advantage of such an arrangement is that the virtualization software may (but need not) be set up to expose “generic” devices, which facilitate, for example, migration of VM from one hardware platform to another.
Different systems may implement virtualization to different degrees—“virtualization” generally relates to a spectrum of definitions rather than to a bright line, and often reflects a design choice in respect to a trade-off between speed and efficiency on the one hand and isolation and universality on the other hand. For example, “full virtualization” is sometimes used to denote a system in which no software components of any form are included in the guest other than those that would be found in a non-virtualized computer; thus, the guest OS could be an off-the-shelf, commercially available OS with no components included specifically to support use in a virtualized environment.
In contrast, another concept, which has yet to achieve a universally accepted definition, is that of “para-virtualization.” As the name implies, a “para-virtualized” system is not “fully” virtualized, but rather the guest is configured in some way to provide certain features that facilitate virtualization. For example, the guest in some para-virtualized systems is designed to avoid hard-to-virtualize operations and configurations, such as by avoiding certain privileged instructions, certain memory address ranges, etc. As another example, many para-virtualized systems include an interface within the guest that enables explicit calls to other components of the virtualization software. For some, para-virtualization implies that the guest OS (in particular, its kernel) is specifically designed to support such an interface. According to this view, having, for example, an off-the-shelf version of Microsoft Windows XP as the guest OS would not be consistent with the notion of para-virtualization. Others define para-virtualization more broadly to include any guest OS with any code that is specifically intended to provide information directly to the other virtualization software. According to this view, loading a module such as a driver designed to communicate with other virtualization components renders the system para-virtualized, even if the guest OS as such is an off-the-shelf, commercially available OS not specifically designed to support a virtualized computer system.
Unless otherwise indicated or apparent, this invention is not restricted to use in systems with any particular “degree” of virtualization and is not to be limited to any particular notion of full or partial (“para-”) virtualization.
In addition to the distinction between full and partial (para-) virtualization, two arrangements of intermediate system-level software layer(s) are in general use—a “hosted” configuration (illustrated in FIG. 2) and a non-hosted configuration (illustrated in FIG. 1). In a hosted virtualized computer system, an existing, general-purpose operating system forms a “host” OS that is used to perform certain input/output (I/O) operations, alongside and sometimes at the request and direction of the VMM 300. The host OS 420, which usually includes drivers 424 and supports applications 460 of its own, and the VMM are both able to directly access at least some of the same hardware resources, with conflicts being avoided by a context-switching mechanism. The Workstation product of VMware, Inc., of Palo Alto, Calif., is an example of a hosted, virtualized computer system.
In addition to device emulators 370, FIG. 2 also illustrates some of the other components that are also often included in the VMM of a hosted virtualization system; many of these components are found in the VMM of a non-hosted system as well. For example, exception handlers 330 may be included to help context-switching, and a direct execution engine 310 and a binary translator 320 with associated translation cache 325 may be included to provide execution speed while still preventing the VM from directly executing certain privileged instructions.
In many cases, it may be beneficial to deploy VMMs on top of a software layer—a kernel 600—constructed specifically to provide efficient support for the VMs. This configuration is frequently referred to as being “non-hosted.” Compared with a system in which VMMs run directly on the hardware platform (such as shown in FIG. 2), use of a kernel offers greater modularity and facilitates provision of services (for example, resource management) that extend across multiple virtual machines. Compared with a hosted deployment, a kernel may offer greater performance because it can be co-developed with the VMM and be optimized for the characteristics of a workload consisting primarily of VMs/VMMs. The kernel 600 also handles any other applications running on it that can be separately scheduled, as well as any “console” operating system 420 that, in some systems, is included to boot the system as a whole and for enabling certain user interactions with the kernel. The console OS in FIG. 1 may be of the same type as the host OS in FIG. 2, which is why they are identically numbered—the main difference is the role they play (or are allowed to play, if any) once the virtualized computer system is loaded and running.
This invention may be used to advantage in both a hosted and a non-hosted virtualized computer system, in which the included virtual machine(s) may be fully or para-virtualized, and in which the virtual machine(s) have any number of virtualized processors, which may be of any type (including multi-core, multi-threaded, or some combination). The invention may also be implemented directly in a computer's primary OS, both where the OS is designed to support virtual machines and where it is not. Moreover, the invention may even be implemented wholly or partially in hardware, for example in processor architectures intended to provide hardware support for virtual machines.
To facilitate effective memory management, many operating systems in use today introduce a layer of abstraction between the memory addresses used by the applications and the memory addresses describing physical memory. When an application requests memory, the operating system will allocate memory in a first address space, typically called a virtual memory address space. This first memory address space maps to a second memory address space, typically the physical memory of the computer. A page table organizes the relationships between the two address spaces and maps memory addresses (for example, given as page numbers) of the first memory address space to memory addresses of the second memory address space. It is common for multiple virtual memory address spaces, as well as multiple page tables, to be implemented in modern operating systems. For example, each application may have its own virtual memory address space. In many systems, each application can treat its virtual memory address space as if it had exclusive use of that memory. The operating system organizes these virtual memory addresses spaces and keeps track of the corresponding physical memory address using entries in a page table.
One of the advantages of using virtual memory address spaces is that the amount of virtual memory used by the applications may exceed the amount of physical memory available on the computer. When such a situation occurs, the operating system will use a secondary storage medium, such as a hard disk, to store some of the data contained in virtual memory. When data from some virtual memory pages is actually stored on the secondary storage medium, the page table will map some virtual memory addresses to physical memory addresses, while mapping other virtual memory addresses to locations on the secondary storage medium.
If an application attempts to access a virtual memory address not mapped to physical memory, the operating system will detect a page fault. In response to a page fault, the operating system will retrieve the requested data from the appropriate storage device, store it in physical memory, and update the page table with the address of the location in physical memory. Retrieving a page and storing it in physical memory is commonly described as “paging-in” the requested page. Frequently, in order to page-in some data, the operating system must first make room in the physical memory. One method for making room in the physical memory is by “paging-out” a page presently stored in the physical memory. Paging-out refers to the process of copying a page from the physical memory to another storage device and updating the page table accordingly. Subsequent access to that virtual memory address will then result in another page fault and the paging-in process will repeat. Ideally, the operating system will page-out pages that are inactive so that they will not have to be paged back in for some reasonable amount of time. Various methods for determining which pages are inactive and are good candidates to be paged-out are well known in the art.
When a guest operating system 220 is run on a virtual machine 200, the guest operating system 220 treats the virtual memory 230 as if it were the physical memory of a computer system. Thus the guest operating system 220 will create virtual memory address spaces (not shown) and map them into the virtual memory 230.
The virtualization layer introduces an additional layer of memory management abstraction. The kernel 600 typically emulates the virtual memory 230 by mapping the virtual memory 230 to the physical memory 130. In many ways, the mapping of the virtual memory 230 to the physical memory 130 is analogous of the mapping of virtual memory addresses to physical memory addresses performed by an operating system. Guest operating systems 220 running on various virtual machines 200 are allowed to treat their virtual memory 230 as if they had exclusive control over that memory, when in fact those virtual memory address spaces are mapped to physical memory 130. Furthermore, as in virtual memory managed by an operating system, the total amount of virtual memory 230 used by the various virtual machines 200 may exceed the total amount of physical memory 130. The virtual machine monitor 300 organizes these virtual memory addresses and keeps track of the corresponding physical memory addresses in the memory 130.
Thus, when a guest operating system 220 implementing virtual memory is run on a virtual machine, typically three levels of memory address spaces are used. The guest operating system 220 organizes some virtual memory address spaces. For the purposes of this application, these addresses spaces are referred to as “guest virtual memory” which are addressed using a “guest virtual page number” (GVPN). The guest operating system 220 maintains a page table that maps this guest virtual memory to the virtual memory 230. Typically, the guest operating system 220 treats the virtual memory 230 as if it were physical memory. For the purposes of this application, the address space of the virtual memory 230 is referred to as the “guest physical memory” which is addressed using a “guest physical page number” (GPPN). The virtual machine monitor 300 maintains a data structure (such as a page table) that maps this guest physical memory to the physical memory 130. The physical memory 130 is addressed using a “physical page number” (PPN), which is sometimes also referred to as a “machine page number” (MPN).
One approach for allowing the total amount of guest physical memory used by the various virtual machines 200 to exceed the total amount of physical memory is for the kernel 600 to page-out some of the inactive guest physical memory. For example, the kernel 600 can copy pages from the physical memory 130 to the disk 140 and adjust the page table entry for the corresponding guest physical memory accordingly. However, such an approach has several problems. First of all, determining which guest physical memory pages are less active (and therefore suitable for page-out) is a challenging task, and it is difficult to find inactive pages with a high degree of accuracy. Paging-out a page that is actively being used will result in a page fault and require the selection of another page for page-out in the near future. The subsequent paging and repaging cycle can seriously affect the performance of the system, and in worst cases, can bring the virtual machine to an effective halt, a condition commonly referred to as “thrashing”.
Furthermore, even if an inactive page of the guest physical memory is determined with a high degree of accuracy, paging-out this page introduces a problem known as “double-paging.” If the guest physical memory is constrained, the guest operating system 220 will be searching for inactive guest virtual memory pages as candidates for page-out. When it finds an inactive guest virtual memory page, it will attempt to read from the corresponding guest physical memory page to perform the page-out. However, since the guest physical memory page is also inactive, it may have already been paged-out by the kernel 600. If the guest physical memory page has already been paged-out by the kernel 600, the attempt to page-out the guest virtual memory will result in a guest physical memory page fault, and the inactive guest physical page will be paged back in. Thus interference between the memory management of the guest operating system 220 and the kernel 600 can significantly reduce the effectiveness of paging-out guest physical memory pages.
Another approach for managing allocations of physical memory to various virtual machines is to dynamically change the size of the guest physical memory. However, most guest operating systems 220 do not provide a mechanism by which the amount of guest physical memory can be increased or decreased during execution of the operating system. Therefore, current techniques for managing allocations of physical memory to various virtual machines are inadequate.