Virtualization technologies are becoming prevalent in the market place. At least some of these technologies provide a virtual hardware abstraction to guest operating systems, and allow them to run in virtual machines in a functionally isolated environment on a host computer. Virtualization allows one or more virtual (guest) machines to run on a single physical (host) computer, providing functional and performance isolation for processor, memory, storage, etc.
As is well known in the field of computer science, a virtual machine is an abstraction—a “virtualization”—of an actual physical computer system. FIG. 1 shows one possible arrangement of a computer system (computer system 700) that implements virtualization. As shown in FIG. 1, virtual machine or “guest” 200 is installed on a “host platform,” or simply “host,” which includes system hardware, that is, hardware platform 100, and one or more layers or co-resident components comprising system-level software, such as an operating system or similar kernel, or a virtual machine monitor or hypervisor (see below), or some combination of these. The system hardware typically includes one or more processors 110, memory 130, some form of mass storage 140, and various other devices 170.
Each virtual machine 200 will typically have both virtual system hardware 201 and guest system software 202. The virtual system hardware typically includes at least one virtual CPU, virtual memory 230, at least one virtual disk 240, and one or more virtual devices 270. Note that a disk—virtual or physical—is also a “device,” but is usually considered separately because of the important role of the disk. All of the virtual hardware components of the virtual machine may be implemented in software using known techniques to emulate the corresponding physical components. The guest system software includes guest operating system (OS) 220 and drivers 224 as needed for the various virtual devices 270.
Note that a single virtual machine 200 may be configured with more than one virtualized processor. To permit computer systems to scale to larger numbers of concurrent threads, systems with multiple CPUs have been developed. These symmetric multi-processor (SMP) systems are available as extensions of the PC platform and from other vendors. Essentially, an SMP system is a hardware platform that connects multiple processors to a shared main memory and shared I/O devices. Virtual machines may also be configured as SMP virtual machines. FIG. 1, for example, illustrates multiple virtual processors 210-0, 210-1, . . . , 210-m (VCPU0, VCPU1, . . . , VCPUm) within virtual machine 200.
Yet another configuration is found in a so-called “multi-core” host architecture, in which more than one physical CPU is fabricated on a single chip, with its own set of functional units (such as a floating-point unit and an arithmetic/logic unit ALU), and can execute threads independently; multi-core processors typically share only very limited resources, such as some cache. Still another configuration that provides for simultaneous execution of multiple threads is referred to as “simultaneous multi-threading,” in which more than one logical CPU (hardware thread) operates simultaneously on a single chip, but in which the logical CPUs flexibly share some resource such as caches, buffers, functional units, etc. One or more embodiments of this invention may be used regardless of the type—physical and/or logical—or number of processors included in a virtual machine.
In many cases applications 261 running on virtual machine 200 will function as they would if run on a “real” computer, even though the applications are running at least partially indirectly, that is via guest OS 220 and virtual processor(s). Executable files will be accessed by the guest OS from virtual disk 240 or virtual memory 230, which will be portions of the actual physical disk 140 or memory 130 allocated to that virtual machine. Once an application is installed within the virtual machine, the guest OS retrieves files from the virtual disk just as if the files had been stored as the result of a conventional installation of the application. The design and operation of virtual machines are well known in the field of computer science.
Some interface is generally required between the guest software within a virtual machine and the various hardware components and devices in the underlying hardware platform. This interface—which may be referred to generally as “virtualization software” or “virtualization logic”—may include one or more software and/or hardware components and/or layers, possibly including one or more of the software components known in the field of virtual machine technology as “virtual machine monitors” (VMMs), “hypervisors,” or “virtualization kernels.” Because virtualization terminology has evolved over time and has not yet become fully standardized, these terms do not always provide clear distinctions between the software layers and components to which they refer. For example, the term “hypervisor” is often used to describe both a VMM and a kernel together, either as separate but cooperating components or with one or more VMMs incorporated wholly or partially into the kernel itself; however, the term “hypervisor” is sometimes used instead to mean some variant of a VMM alone, which interfaces with some other software layer(s) or component(s) to support the virtualization. Moreover, in some systems, some virtualization code is included in at least one “superior” virtual machine to facilitate the operations of other virtual machines. Furthermore, specific software support for virtual machines may be included in the host OS itself. Unless otherwise indicated, one or more embodiments of the invention described herein may be used in virtualized computer systems having any type or configuration of virtualization logic.
FIG. 1 shows virtual machine monitors that appear as separate entities from other components of the virtualization software. Furthermore, some software components used to implement one or more embodiments of the invention are shown and described as being within a “virtualization layer” located logically between all virtual machines and the underlying hardware platform and/or system-level host software. This virtualization layer can be considered part of the overall virtualization software, although it would be possible to implement at least part of this layer in specialized hardware. The illustrated embodiments are given only for the sake of simplicity and clarity and by way of illustration—as mentioned above, the distinctions are not always so clear-cut. Again, unless otherwise indicated or apparent from the description, it is to be assumed that one or more embodiments of the invention can be implemented anywhere within the overall structure of the virtualization software, and even in systems that provide specific hardware support for virtualization.
The various virtualized hardware components in the virtual machine, such as virtual CPU(s) 210-0, 210-1, . . . , 210-m, virtual memory 230, virtual disk 240, and virtual device(s) 270, are shown as being part of virtual machine 200 for the sake of conceptual simplicity. In actuality, these “components” are usually implemented as software emulations 330 included in the VMM.
Different systems may implement virtualization to different degrees—“virtualization” generally relates to a spectrum of definitions rather than to a bright line, and often reflects a design choice with respect to a trade-off between speed and efficiency on the one hand and isolation and universality on the other hand. For example, “full virtualization” is sometimes used to denote a system in which no software components of any form are included in the guest other than those that would be found in a non-virtualized computer; thus, the guest OS could be an off-the-shelf, commercially available OS with no components included specifically to support use in a virtualized environment.
In contrast, another term, which has yet to achieve a universally accepted definition, is that of “para-virtualization.” As the term implies, a “para-virtualized” system is not “fully” virtualized, but rather the guest is configured in some way to provide certain features that facilitate virtualization. For example, the guest in some para-virtualized systems is designed to avoid hard-to-virtualize operations and configurations, such as by avoiding certain privileged instructions, certain memory address ranges, etc. As another example, many para-virtualized systems include an interface within the guest that enables explicit calls to other components of the virtualization software.
For some, the term para-virtualization implies that the guest OS (in particular, its kernel) is specifically designed to support such an interface. According to this view, having, for example, an off-the-shelf version of Microsoft Windows XP™ as the guest OS would not be consistent with the notion of para-virtualization. Others define the term para-virtualization more broadly to include any guest OS with any code that is specifically intended to provide information directly to any other component of the virtualization software. According to this view, loading a module such as a driver designed to communicate with other virtualization components renders the system para-virtualized, even if the guest OS as such is an off-the-shelf, commercially available OS not specifically designed to support a virtualized computer system. Unless otherwise indicated or apparent, embodiments of this invention are not restricted to use in systems with any particular “degree” of virtualization and are not to be limited to any particular notion of full or partial (“para-”) virtualization.
In addition to the sometimes fuzzy distinction between full and partial (para-) virtualization, two arrangements of intermediate system-level software layer(s) are in general use—a “hosted” configuration and a non-hosted configuration (which is shown in FIG. 1). In a hosted virtualized computer system, an existing, general-purpose operating system forms a “host” OS that is used to perform certain input/output (I/O) operations, alongside and sometimes at the request of the VMM.
As illustrated in FIG. 1, in many cases, it may be beneficial to deploy VMMs on top of a software layer—kernel 600—constructed specifically to provide support for the virtual machines. This configuration is frequently referred to as being “non-hosted.” Kernel 600 also handles any other applications running on it that can be separately scheduled, as well as a console operating system that, in some architectures, is used to boot the system and facilitate certain user interactions with the virtualization software.
Note that kernel 600 is not the same as the kernel that will be within guest OS 220—as is well known, every operating system has its own kernel. Note also that kernel 600 is part of the “host” platform of the virtual machine/VMM as defined above even though the configuration shown in FIG. 1 is commonly termed “non-hosted;” moreover, the kernel may be both part of the host and part of the virtualization software or “hypervisor.” The difference in terminology is one of perspective and definitions that are still evolving in the art of virtualization.
As will be understood by those of ordinary skill in the relevant art, while a virtual machine 200 is running, the hypervisor (or VMM, etc.) often performs a non-trivial amount of work in order to properly virtualize guest operations. In some cases, such as guest I/O operations, the system architecture and device model allow the guest to continue doing other useful work while the hypervisor processes earlier operations asynchronously. The hypervisor typically notifies the guest OS when the asynchronous operation completes, such as by posting a virtual I/O completion interrupt, emulating the behavior of hardware in a non-virtualized system.
Operating systems, including guest operating systems, are typically designed to tolerate known high-latency operations, such as device I/O, by using asynchronous techniques. For example, a process which executes a disk I/O request that must complete before further execution (e.g., a filesystem metadata write), will be de-scheduled until the I/O completes, allowing other unrelated processes to run.
Virtualization may also introduce significant delays that may or may not be detectable by a guest OS. These delays may not be anticipated by the guest OS, because the virtualization based operations that give rise to the delays may not be visible to the guest OS. For example, the hypervisor may swap out a page of guest “physical” memory to disk, in order to free up memory for other VMs. When a guest process later accesses this “physical” memory, the hypervisor must restore its contents from disk before allowing the process to continue execution. In existing systems, the hypervisor de-schedules the entire VM, or at least the VCPU that performed the access. This effectively blocks the guest from making any further progress until the swap-in completes.
To provide a specific example in more detail, when an OS is virtualized, the physical pages of the guest (GPPNs) have to be mapped to some real machine pages (MPNs) in the physical host. When the hypervisor is under memory pressure, it will typically need to reclaim an MPN from the guest, the reclaimed MPN typically being selected based on some working set algorithm (or randomly). The hypervisor then swaps the page to the hypervisor-level swap device, and allows the MPN to be used by, for example, another guest. If the first VM were then to access the GPPN, it would take a page fault in the processor, and the hypervisor would have to bring the contents back from the swap device. This process can be quite time consuming. More specifically, since the swap device is typically a mechanical device such as a disk drive, typical access latencies are several milliseconds, in which time a modern CPU can execute several million instructions. Unfortunately, the guest cannot make forward progress in the meantime, as the hypervisor and its operations are not visible to the guest. This is but one example of many circumstances under which the hypervisor performs a lengthy operation without the guest being able to schedule another process during the period of latency.
It would be desirable to allow the guest to continue performing useful work where the hypervisor is performing virtualization tasks that are not visible to the guest.