1. Field of the Invention
This invention relates generally to a virtualized computer system and, in particular, to a method and system for reducing the latency of virtual interrupt delivery in virtual machines of a virtualized computer system.
2. Description of the Related Art
The advantages of virtual machine technology have become widely recognized. Among these advantages is the ability to run multiple virtual machines on a single host platform. This makes better use of the capacity of the hardware, while still ensuring that each user enjoys the features of a “complete” computer. Depending on how it is implemented, virtualization can also provide greater security, since the virtualization can isolate potentially unstable or unsafe software so that it cannot adversely affect the hardware state or system files required for running the physical (as opposed to virtual) hardware.
As is well known in the field of computer science, a virtual machine (VM) is an abstraction—a “virtualization”—of an actual physical computer system. FIG. 1 shows one possible arrangement of a computer system 700 that implements virtualization. A virtual machine (VM) or “guest” 200 is installed on a “host platform,” or simply “host,” which will include system hardware, that is, a hardware platform 101, and one or more layers or co-resident components comprising system-level software, such as an operating system or similar kernel, or a virtual machine monitor or hypervisor (see below), or some combination of these. The system hardware 101 typically includes one or more processors 110, memory 130, and physical hardware devices 100 including some form of mass storage 140 and various other devices 170.
Each VM 200 will typically have both virtual system hardware 201 and guest system software 202. The virtual system hardware 201 typically includes at least one virtual CPU, virtual memory 230, at least one virtual disk 240, and one or more virtual devices 270. Note that a disk—virtual or physical—is also a “device,” but is usually considered separately because of the important role of the disk. All of the virtual hardware components of the VM may be implemented in software using known techniques to emulate the corresponding physical components. The guest system software includes a guest operating system (OS) 220 and drivers 224 as needed for the various virtual devices 270. Although FIG. 1 illustrates that the virtual system hardware 201 is included in the VMs 200, the virtual system hardware 201 may reside in a gray area between the VMs 200 and the VMMs 300 or in the VMMs 300 themselves, as illustrated in FIG. 2.
Referring back to FIG. 1, note that a single VM may be configured with more than one virtualized processor. To permit computer systems to scale to larger numbers of concurrent threads, systems with multiple CPUs have been developed. These symmetric multi-processor (SMP) systems are available as extensions of the PC platform and from other vendors. Essentially, an SMP system is a hardware platform that connects multiple processors to a shared main memory and shared I/O devices. Virtual machines may also be configured as SMP VMs. FIG. 1, for example, illustrates multiple virtual processors 210-0, 210-1, . . . , 210-m (VCPU0, VCPU1, . . . , VCPUm) within the VM 200.
Yet another configuration is found in a so-called “multi-core” architecture, in which more than one physical CPU is fabricated on a single chip, with its own set of functional units (such as a floating-point unit and an arithmetic/logic unit ALU), and can execute threads independently; multi-core processors typically share only very limited resources, such as some cache. Still another technique that provides for simultaneous execution of multiple threads is referred to as “simultaneous multi-threading,” in which more than one logical CPU (hardware thread) operates simultaneously on a single chip, but in which the logical CPUs flexibly share some resource such as caches, buffers, functional units, etc. This invention may be used regardless of the type—physical and/or logical—or number of processors included in a VM.
If the VM 200 is properly designed, applications 260 running on the VM will function as they would if run on a “real” computer, even though the applications are running at least partially indirectly, that is via the guest OS 220 and virtual processor(s). Executable files will be accessed by the guest OS from the virtual disk 240 or virtual memory 230, which will be portions of the actual physical disk 140 or memory 130 allocated to that VM. Once an application is installed within the VM, the guest OS retrieves files from the virtual disk just as if the files had been pre-stored as the result of a conventional installation of the application. The design and operation of virtual machines are well known in the field of computer science.
Some interface is generally required between the guest software within a VM and the various hardware components and devices in the underlying hardware platform. This interface—which may be referred to generally as “virtualization software”—may include one or more software components and/or layers, possibly including one or more of the software components known in the field of virtual machine technology as “virtual machine monitors” (VMMs), “hypervisors,” or virtualization “kernels.” Because virtualization terminology has evolved over time and has not yet become fully standardized, these terms do not always provide clear distinctions between the software layers and components to which they refer. For example, “hypervisor” is often used to describe both a VMM and a kernel together, either as separate but cooperating components or with one or more VMMs incorporated wholly or partially into the kernel itself, however, “hypervisor” is sometimes used instead to mean some variant of a VMM alone, which interfaces with some other software layer(s) or component(s) to support the virtualization. Moreover, in some systems, some virtualization code is included in at least one “superior” VM to facilitate the operations of other VMs. Furthermore, specific software support for VMs may be included in the host OS itself. Unless otherwise indicated, the invention described below may be used in virtualized computer systems having any type or configuration of virtualization software.
Moreover, FIG. 1 shows virtual machine monitors that appear as separate entities from other components of the virtualization software. Furthermore, some software components used to implement one illustrated embodiment of the invention are shown and described as being within a “virtualization layer” located logically between all virtual machines and the underlying hardware platform and/or system-level host software. This virtualization layer can be considered part of the overall virtualization software, although it would be possible to implement at least part of this layer in specialized hardware. The illustrated embodiments are given only for the sake of simplicity and clarity and by way of illustration—as mentioned above, the distinctions are not always so clear-cut. Again, unless otherwise indicated or apparent from the description, it is to be assumed that the invention can be implemented anywhere within the overall structure of the virtualization software, and even in systems that provide specific hardware support for virtualization.
The various virtualized hardware components in the VM, such as the virtual CPU(s) 210-0, 210-1, . . . , 210-m, the virtual memory 230, the virtual disk 240, and the virtual device(s) 270, are shown as being part of the VM 200 for the sake of conceptual simplicity. In actuality, these “components” are usually implemented as software emulations 330 included in the VMM. One advantage of such an arrangement is that the VMM may (but need not) be set up to expose “generic” devices, which facilitate VM migration and hardware platform-independence.
Different systems may implement virtualization to different degrees—“virtualization” generally relates to a spectrum of definitions rather than to a bright line, and often reflects a design choice with respect to a trade-off between speed and efficiency on the one hand and isolation and universality on the other hand. For example, “full virtualization” is sometimes used to denote a system in which no software components of any form are included in the guest other than those that would be found in a non-virtualized computer; thus, the guest OS could be an off-the-shelf, commercially available OS with no components included specifically to support use in a virtualized environment.
In contrast, another concept, which has yet to achieve a universally accepted definition, is that of “para-virtualization.” As the name implies, a “para-virtualized” system is not “fully” virtualized, but rather the guest is configured in some way to provide certain features that facilitate virtualization. For example, the guest in some para-virtualized systems is designed to avoid hard-to-virtualize operations and configurations, such as by avoiding certain privileged instructions, certain memory address ranges, etc. As another example, many para-virtualized systems include an interface within the guest that enables explicit calls to other components of the virtualization software.
For some, para-virtualization implies that the guest OS (in particular, its kernel) is specifically designed to support such an interface. According to this view, having, for example, an off-the-shelf version of Microsoft Windows XP™ as the guest OS would not be consistent with the notion of para-virtualization. Others define para-virtualization more broadly to include any guest OS with any code that is specifically intended to provide information directly to any other component of the virtualization software. According to this view, loading a module such as a driver designed to communicate with other virtualization components renders the system para-virtualized, even if the guest OS as such is an off-the-shelf, commercially available OS not specifically designed to support a virtualized computer system. Unless otherwise indicated or apparent, this invention is not restricted to use in systems with any particular “degree” of virtualization and is not to be limited to any particular notion of full or partial (“para-”) virtualization.
In addition to the sometimes fuzzy distinction between full and partial (para-) virtualization, two arrangements of intermediate system-level software layer(s) are in general use—a “hosted” configuration and a non-hosted configuration (which is shown in FIG. 1). In a hosted virtualized computer system, an existing, general-purpose operating system forms a “host” OS that is used to perform certain input/output (I/O) operations, alongside and sometimes at the request of the VMM. The Workstation product of VMware, Inc., of Palo Alto, Calif., is an example of a hosted, virtualized computer system, which is also explained in U.S. Pat. No. 6,496,847 (Bugnion, et al., “System and Method for Virtualizing Computer Systems,” 17 Dec. 2002).
As illustrated in FIG. 1, in many cases, it may be beneficial to deploy VMMs on top of a software layer—a kernel 600—constructed specifically to provide efficient support for the VMs. This configuration is frequently referred to as being “non-hosted.” Compared with a system in which VMMs run directly on the hardware platform, use of a kernel offers greater modularity and facilitates provision of services (for example, resource management) that extend across multiple virtual machines. Compared with a hosted deployment, a kernel may offer greater performance because it can be co-developed with the VMM and be optimized for the characteristics of a workload consisting primarily of VMs/VMMs. The kernel 600 also handles any other applications running on it that can be separately scheduled, as well as a console operating system that, in some architectures, is used to boot the system and facilitate certain user interactions with the virtualization software.
Note that the kernel 600 (also referred to herein as the “VMkernel”) is not the same as the kernel that will be within the guest OS 220—as is well known, every operating system has its own kernel. Note also that the kernel 600 is part of the “host” platform of the VM/VMM as defined above even though the configuration shown in FIG. 1 is commonly termed “non-hosted;” moreover, the kernel may be both part of the host and part of the virtualization software or “hypervisor.” The difference in terminology is one of perspective and definitions that are still evolving in the art of virtualization.
The kernel 600 is responsible for initiating physical input/output (I/O) on behalf of the VMs 200 and communicating the I/O completion events back to the VMs 200. In fully virtualized systems, I/O completion events often take the form of a virtual interrupt delivered to one of the virtual processors (VCPUs) of the requesting VM. The VMM is typically charged with virtualization of the virtual processors and the virtual interrupt system that guides virtual interrupt delivery. In an SMP virtual machine, the guest OS 220 may program the virtual interrupt system for interrupt delivery to an arbitrary subset of VCPUs. In x86 domains, this information is generally distributed among different pieces of virtual interrupt hardware such as IO APIC (Advanced Programmable Interrupt Controller), local APIC (Advanced Programmable Interrupt Controller), the MSI (Message Signaled Interrupt) state in device PCI (Peripheral Component Interconnect) configuration space, and the like.
FIG. 2 illustrates how I/O is completed in a virtualized computer system. When the guest O/S 220 requests a virtual I/O on a virtual system hardware 201, the virtualization software (the VMM 300 and the VMkernel 600) generates a physical I/O that corresponds to the virtual I/O request to the actual hardware device 100 backing up the virtual system hardware 201. Once the physical I/O is completed, the hardware device 100 generates a physical (hardware) interrupt to inform the virtualization software of the completion of the physical I/O. In response, the VMM 300 (more specifically, the interrupt system 280 including the virtual interrupt controller 282 and the VMM interrupt router 284) generates a virtual interrupt to the guest O/S 220 to inform the guest O/S 220 of completion of the I/O.
The virtual interrupt state is dynamic and might be changed by the guest OS 220 after a physical I/O is requested but before it has been completed. Because of the complexity of virtual interrupt systems, the kernel 600 typically does not have information on the details of the guest interrupt system programming. This may cause latency in the delivery of virtual interrupts, as is explained with reference to FIG. 3.
FIG. 3 is an interaction diagram illustrating how I/O is requested and completed in a conventional virtualized computer system. Referring to FIG. 3 together with FIG. 2, the guest O/S 220 issues a virtual I/O 302 request to a virtual device 201. The virtual device 201 makes a VMKernel call 304 to the VMkernel 600, and the VMkernel 600 issues a command corresponding to the VMKernel call 304 to the VMkernel driver 288. The VMkernel driver 288 (FIG. 2) provides an interface between the VMkernel 600 and the physical hardware device 100 that corresponds to the virtual device 201. Thus, based on the command 306 corresponding to the VMkernel call 304, the VMkernel driver 288 makes a hardware specific I/O request 308 that is specific to the hardware device 100 to which the I/O request is destined.
Thereafter, typically some time will pass (as indicated by the double dotted lines), until the I/O is actually completed 310 by the hardware device 100. The hardware device 100 makes a hardware interrupt 312 to the VMkernel driver 288 and to the VMkernel 600 to notify the VMkernel 600 that the hardware I/O is complete. The VMkernel driver 288 inspects (314, 316) the device state to determine the specifics of the I/O operation. The VMkernel driver 288 makes the I/O data available 318 to the VMkernel 600.
The privileged domain (the VMkernel 600) is typically charged with communicating I/O completions to virtual processors of VMs. However, in conventional virtualized computer systems, the VMkernel 600 does not have access to all necessary information about the details of the virtual interrupt system configuration of each VM to optimally select a set of destination VCPUs for each I/O completion event. Since the VMkernel 600 does not know the correct destination VCPU 210 responsible for the I/O and virtual interrupt at this time, the VMkernel 600 typically just selects one of the VCPUs of the VM 200 in order to notify it of the I/O completion. In response, the VMkernel 600 posts an asynchronous action 321 to some VCPU 210 (virtual system hardware 201) to generate the virtual interrupt to the selected VCPU 210. In response, the virtual device 201 asserts an IRQ (Interrupt Request Line) 322 to obtain the (interrupt vector, VCPU) pair for the virtual interrupt to notify the guest O/S 220 of the completion of the I/O. The term “(interrupt vector, VCPU) pair” refers to the set of data that includes a pair of the interrupt vector and the VCPU.
If the preliminary target VCPU is already running on a different CPU (physical CPU), VMkernel 600 dispatches an inter-processor interrupt (IPI) to the target VCPU to ensure that a virtual interrupt 328 is dispatched in a timely manner with the (interrupt vector, VCPU) pair. On the other hand, if the preliminary target VCPU isn't currently scheduled, a scheduler intervention might be necessary to ensure that the target VCPU receives an I/O completion event. In this regard, the initial target VCPU consults the virtual interrupt system 280 to determine the final destination VCPU, and an action is dispatched from the initial preliminary target VCPU to the final destination VCPU. If the final destination VCPU is not running, the VMM interrupt system 280 sends a reschedule request 324 to the VMkernel 600 to reschedule that VCPU so that it can process an action. As a result, the VMkernel 600 reschedules 326 that final destination VCPU. The guest-designated final destination VCPU is often different from the initial VCPU target selected by the VMkernel 600 without consulting the virtual interrupt system 280. If so, the VMM interrupt system 280 redispatches a virtual interrupt 328 to the final destination VCPU set. The last step might involve IPIs and scheduler invocations similar to those of the original dispatch. In some elaborate cases of Logical Delivery Mode, the target might be not a single VCPU, but an arbitrary subset of VCPUs.
Because the VMkernel 600 attempts to invoke the virtual interrupt without knowing which specific VCPU can receive the virtual interrupt of the completion of the I/O, the step 324 of rescheduling the request and the step 326 of rescheduling the VCPU may be necessary if the guest designated target VCPU is different from the original VCPU target selected by the VMkernel 600. These additional steps 324, 326 increase the latency of virtual interrupt deliveries in virtual machines, which degrade performance of the virtualized computer system.
Therefore, there is a need for a technique for reducing the latency of virtual interrupt deliveries in virtual machines.