1. Field of the Invention
This invention relates generally to computer virtualization and, in particular, to a method and system for avoiding synchronization bugs through virtualization.
2. Description of Related Art
The advantages of virtual machine technology have become widely recognized. Among these advantages is the ability to run multiple virtual machines on a single host platform. This makes better use of the capacity of the hardware, while still ensuring that each user enjoys the features of a “complete” computer. Depending on how it is implemented, virtualization also provides greater stability, since the virtualization can isolate potentially unstable or unsafe software so that it cannot adversely affect the hardware state or system files required for running the physical (as opposed to virtual) hardware.
As is well known in the field of computer science, a virtual machine (VM) is a software abstraction—a “virtualization”—of an actual physical computer system. FIG. 1(a) shows one possible arrangement of a computer system 101 that implements virtualization. A virtual machine (VM) 102, which in this system is a “guest,” is installed on a “host platform,” or simply “host,” which will include a system hardware 100, that is, a hardware platform, and one or more layers or co-resident components comprising system-level software, such as an operating system (OS) or similar kernel, a virtual machine monitor or hypervisor (see below), or some combination of these.
As software, the code defining the VM will ultimately execute on the actual system hardware 100. As in almost all computers, this hardware will include one or more CPUs 120, some form of memory 130 (volatile or non-volatile), one or more storage devices such as one or more disks 140, and one or more devices 170, which may be integral or separate and removable.
In many existing virtualized systems, the CPU(s) 120 are the same as in a non-virtualized computer with the same platform, for example, the Intel x86 platform. Because of the advantages of virtualization, however, some hardware vendors have proposed, and are presumably developing, hardware processors that include specific hardware support for virtualization.
Each VM 102 will typically mimic the general structure of a physical computer and as such will usually have both virtual system hardware 104 and guest system software 106. The virtual system hardware typically includes at least one virtual CPU 108, virtual memory 112, at least one virtual disk 116, and one or more virtual devices 110. Note that a storage disk—virtual 116 or physical 140—is also a “device,” but is usually considered separately because of the important role it plays. A virtual CPU may also sometimes be referred to as a virtual processor. The term ‘virtual processor’ is synonymous with the term ‘virtual CPU’ or ‘VCPU’ for the purposes of reference throughout this text. All of the virtual hardware components of the VM may be implemented in software to emulate corresponding physical components. The guest system software includes a guest operating system (OS) 122 and drivers 124 as needed, for example, for the various virtual devices 110.
To permit computer systems to scale to larger numbers of concurrent threads, systems with multiple CPUs have been developed. Many conventional hardware platforms therefore include more than one hardware processor 120. In many such platforms, each processor is a separate “chip” and may share system resources such as main memory and/or at least one I/O device. “Multi-cored” architectures have also been developed (for example, IBM POWER4 and POWER5 architectures, as well as the Sun UltraSparc IV), in which more than one physical CPU is fabricated on a single chip, with its own set of functional units (such as a floating-point unit and an arithmetic/logic unit ALU), and can execute threads independently. Multi-cored processors typically share only very limited resource, such as at least some cache.
Still another modern technique that provides for simultaneous execution of multiple threads is referred to as “simultaneous multi-threading,” in which more than one logical CPU (hardware thread) operates simultaneously on a single chip, but in which the logical CPUs flexibly share not only one or more caches, but also some functional unit(s) and sometimes also the translation lookaside buffer (TLB). One example of a multi-threaded architecture is Intel Corporation's “Hyper-Threading Technology,” used to improve the performance of its Pentium IV and Xeon processor lines. It is also possible to have an architecture that is both multi-cored and multi-threaded.
Similarly, a single VM may also have (that is, be exposed to) more than one virtualized processor. Symmetric multi-processor (SMP) systems are commonly available, and may be implemented in both virtualized and non-virtualized systems. Essentially, an SMP system is a hardware platform that connects multiple processors to a shared main memory and shared I/O devices. Virtual machines may also be configured as SMP VMs. FIG. 1(a), for example, illustrates multiple virtual processors 108 within the VM 102. Each virtualized processor in a VM may also be multi-cored, or multi-threaded, or both, depending on the virtualization.
Applications 126 running on the VM will typically function as they would if run on a “real” computer, even though the applications are running at least partially indirectly, that is via the guest OS 122 and virtual processor(s). Executable files will be accessed by the guest OS from the virtual disk 116 or virtual memory 112, which will be portions of the actual physical disk 140 or memory 130 allocated to that VM. Once an application is installed within the VM, the guest OS retrieves files from the virtual disk just as if the files had been pre-stored as the result of a conventional installation of the application. The design and operation of virtual machines in general are known in the field of computer science.
Some interface is usually required between a VM 102 and the underlying host platform (in particular, the hardware CPU(s) 120 and any intermediate system-level software layers), which is (are) responsible for actually submitting and executing VM-issued instructions and for handling I/O operations, including transferring data to and from the hardware memory 130 and storage devices 140. A common term for this interface or virtualization layer is a “virtual machine monitor” (VMM), shown as component 128. A VMM is usually a software component that virtualizes at least some of the resources of the physical host machine, or at least some hardware resource, so as to export a hardware interface to the VM corresponding to the hardware the VM “thinks” it is running on. As FIG. 1(a) illustrates, a virtualized computer system may (and usually will) have more than one VM, each of which may be running on its own VMM.
The various virtualized hardware components in the VM, such as the virtual CPU(s) 108, the virtual memory 112, the virtual disk 116, and the virtual device(s) 110, are shown as being part of the VM 102 for the sake of conceptual simplicity. In actuality, these “components” are often implemented as software emulations included in the VMM.
In contrast to a fully virtualized system, the guest OS 122 in a so-called “para-virtualized” system is modified to support virtualization, such that it not only has an explicit interface to the VMM, but is sometimes also allowed to access at least one host hardware resource directly. In some para-virtualized systems, one of a plurality of VMs plays a “superior” role in that it mediates some requests for hardware resources made by the guest OSs of other VMs. In short, virtualization transparency is sacrificed to gain speed or to make it easier to implement the VMM that supports the para-virtualized machine. In such para-virtualized systems, the VMM is sometimes referred to as a “hypervisor.”
In addition to the distinction between full and partial (para-) virtualization, two arrangements of intermediate system-level software layer(s) are in general use—a “hosted” configuration (illustrated in FIG. 1(b)) and a non-hosted configuration (illustrated in FIG. 1(a)). In a hosted virtualized computer system, an existing, general-purpose operating system forms a “host” OS that is used to perform certain input/output (I/O) operations, alongside and sometimes at the request and direction of the VMM 128. The host OS 132, which usually includes drivers 134 and supports applications 136 of its own, and the VMM are both able to directly access at least some of the same hardware resources, with conflicts being avoided by having the VMM transparently save and restore host state when switching between the host and the VMM. The Workstation product of VMware, Inc., of Palo Alto, Calif., is an example of a hosted, virtualized computer system, which is also explained in U.S. Pat. No. 6,496,847 (Bugnion, et al., “System and Method for Virtualizing Computer Systems,” 17 Dec. 2002), which is incorporated by reference herein in its entirety.
In addition to device emulators 138, FIG. 1(b) also illustrates some of the other components that are also often included in the VMM 128 of a hosted virtualization system; many of these components are found in the VMM 128 of a non-hosted system as well. For example, exception handlers 142 may be included to help execute the virtual machine instruction stream, and a direct execution engine 144 and a binary translator 146 with associated translation cache 148 may be included to provide execution speed while still preventing the VM from directly executing certain privileged instructions. According to one embodiment of the present invention, a translation cache 148 may be included for every virtual CPU 108. The binary translator 146 may be implemented as computer code stored on a computer readable medium.
In many cases, it may be beneficial to deploy VMMs on top of a software layer—a kernel 152—constructed specifically to provide efficient support for the VMs. This configuration is frequently referred to as being “non-hosted.” Compared with a system in which VMMs run directly on the hardware platform (such as shown in FIG. 1(b)), use of a kernel 152 offers greater modularity and facilitates provision of services (for example, resource management) that extend across multiple virtual machines. Compared with a hosted deployment, a kernel may offer greater performance because it can be co-developed with the VMM 128 and be optimized for the characteristics of a workload consisting primarily of VMs/VMMs. The kernel 152 also handles any other applications running on it that can be separately scheduled, as well as any temporary “console” operating system 132 that, in some systems, is included to boot the system as a whole and for enabling certain user interactions with the kernel. The console OS 132 in FIG. 1(a) may be of the same type as the host OS in FIG. 1(b), which is why they are identically numbered—the main difference is the role they play (or are allowed to play, if any) once the virtualized computer system is loaded and running.
In practice, the applications 126 and the guest operating system 122 sometimes contain glitches that can result in undesirable consequences. While effort is usually made to produce error-free software, the size and scope of these systems and applications makes the occasional appearances of errors, or ‘bugs’, inevitable. Typically, users must work with these errors, or if a problem is severe enough, the manufacturer of the system or application will release a patch.
Sometimes virtualization exposes errors in the applications 126 or guest operating system 122 that are not apparent in the non-virtualized use of the system or application. For example, synchronization bugs may go undetected when they depend on a statistically unlikely series of events. However, changes to the speed of execution in a multithreaded or multiprocessor environment, such as a virtualized device responding slower or faster than its physical analogue, sometimes increase the likelihood that a synchronization bug may result in a concurrency error. A concurrency error, generally, is an error that occurs when two threads of execution attempt to access the same data simultaneously, or when an improper interaction between threads of execution results in the corruption of data.
As these bugs may only be significantly exposed in certain conditions, such as when the operating system or application is run by a virtual machine, it is possible that the manufacturer of the operating system or application may have ended-of-lifed the product. In some cases, the error may make the operating system or application unusable entirely.
What is needed is a system and method for reducing the likelihood of concurrency errors in applications and operating systems.