1. Field of the Invention
This invention relates to the field of virtual machines for computer systems.
2. Description of the Related Art
The preferred embodiment of the invention is described relative to a binary translator for a virtual computer system. Consequently, this description begins with an introduction to virtual computing.
Virtualization has brought many advantages to the world of computers. As is well known in the art, a virtual machine (VM) is a software abstraction—a “virtualization”—of an actual physical computer system that runs as a “guest” on an underlying “host” hardware platform. As long as a suitable interface is provided between the VM and the host platform, one advantage is that the operating system (OS) in the guest need not be the same as the OS at the system level in the host. For example, applications that presuppose a Microsoft Windows OS can be run in the VM even though the OS used to handle actual I/O, memory management, etc., on the host might be Linux.
It usually requires less than 10% of the processing capacity of a CPU to run a typical application, although usage may peak briefly for certain operations. Virtualization can more efficiently use processing capacity by allowing more than one VM to run on a single host, effectively multiplying the number of “computers” per “box.” Depending on the implementation, the reduction in performance is negligible, or at least not enough to justify separate, dedicated hardware “boxes” for each user.
Still another advantage is that different VMs can be isolated from and completely transparent to one another. Indeed, the user of a single VM will normally be unaware that he is not using a “real” computer, that is, a system with hardware dedicated exclusively to his use. The existence of the underlying host will also be transparent to the VM software itself. The products of VMware, Inc., of Palo Alto, Calif. provide all of these advantages in that they allow multiple, isolated VMs, which may (but need not) have OSs different from each other's, to run on a common hardware platform.
Example of a Virtualized System
FIG. 1 illustrates the main components of a system that supports a virtual machine as implemented in the Workstation product of VMware, Inc. As in conventional computer systems, both system hardware 100 and system software 200 are included. The system hardware 100 includes CPU(s) 102, which may be a single processor, or two or more cooperating processors in a known multiprocessor arrangement. The system hardware also includes system memory 104, one or more disks 106, and some form of memory management unit (MMU) 108. As is well understood in the field of computer engineering, the system hardware also includes, or is connected to, conventional registers, interrupt-handling circuitry, a clock, etc., which, for the sake of simplicity, are not shown in the figure.
The system software 200 either is or at least includes an operating system OS 220, which has drivers 240 as needed for controlling and communicating with various devices 110, and usually with the disk 106 as well. Conventional applications 260, if included, may be installed to run on the hardware 100 via the system software 200 and any drivers needed to enable communication with devices.
As mentioned above, the virtual machine (VM) 300—also known as a “virtual computer”—is a software implementation of a complete computer system. In the VM, the physical system components of a “real” computer are emulated in software, that is, they are virtualized. Thus, the VM 300 will typically include virtualized (“guest”) system hardware 301, which in turn includes one or more virtual CPUs 302 (VCPU), virtual system memory 304 (VMEM), one or more virtual disks 306 (VDISK), and one or more virtual devices 310 (VDEVICE), all of which are implemented in software to emulate the corresponding components of an actual computer.
The VM's system software 312 includes a guest operating system 320, which may, but need not, simply be a copy of a conventional, commodity OS, as well as drivers 340 (DRVS) as needed, for example, to control the virtual device(s) 310. Of course, most computers are intended to run various applications, and a VM is usually no exception. Consequently, by way of example, FIG. 1 illustrates one or more applications 360 installed to run on the guest OS 320; any number of applications, including none at all, may be loaded for running on the guest OS, limited only by the requirements of the VM.
Note that although the hardware “layer” 301 will be a software abstraction of physical components, the VM's system software 312 may be the same as would be loaded into a hardware computer. The modifier “guest” is used here to indicate that the VM, although it acts as a “real” computer from the perspective of a user, is actually just computer code that is executed on the underlying “host” hardware and software platform 100, 200. Thus, for example, I/O to the virtual device 310 will actually be carried out by I/O to the hardware device 110, but in a manner transparent to the VM.
If the VM is properly designed, then the applications (or the user of the applications) will not “know” that they are not running directly on “real” hardware. Of course, all of the applications and the components of the VM are instructions and data stored in memory, just as any other software. The concept, design and operation of virtual machines are well known in the field of computer science. FIG. 1 illustrates a single VM 300 merely for the sake of simplicity; in many installations, there will be more than one VM installed to run on the common hardware platform; all will have essentially the same general structure, although the individual components need not be identical.
Some interface is usually required between the VM 300 and the underlying “host” hardware 100, which is responsible for actually executing or having executed VM-related instructions and transferring data to and from the actual, physical memory 104. One advantageous interface between the VM and the underlying host system is often referred to as a virtual machine monitor (VMM), also known as a virtual machine “manager.” Virtual machine monitors have a long history, dating back to mainframe computer systems in the 1960s. See, for example, Robert P. Goldberg, “Survey of Virtual Machine Research,” IEEE Computer, June 1974, p. 54-45.
A VMM is usually a relatively thin layer of software that runs directly on top of a host, such as the system software 200, or directly on the hardware, and virtualizes the resources of the (or some) hardware platform. The VMM will typically include at least one device emulator 410, which may also form the implementation of the virtual device 310. The interface exported to the respective VM is usually such that the guest OS 320 cannot determine the presence of the VMM. The VMM also usually tracks and either forwards (to the host OS 220) or itself schedules and handles all requests by its VM for machine resources, as well as various faults and interrupts. FIG. 1 therefore illustrates an interrupt (including fault) handler 450 within the VMM. The general features of VMMs are well known and are therefore not discussed in further detail here.
In FIG. 1, a single VMM 400 is shown acting as the interface for the single VM 300. It would also be possible to include the VMM as part of its respective VM, that is, in each virtual system. Although the VMM is usually completely transparent to the VM, the VM and VMM may be viewed as a single module that virtualizes a computer system. The VM and VMM are shown as separate software entities in the figures for the sake of clarity. Moreover, it would also be possible to use a single VMM to act as the interface for more than one VM, although it will in many cases be more difficult to switch between the different contexts of the various VMs (for example, if different VMs use different guest operating systems) than it is simply to include a separate VMM for each VM. This invention works with all such VM/VMM configurations.
In some configurations, the VMM 400 runs as a software layer between the host system software 200 and the VM 300. In other configurations, such as the one illustrated in FIG. 1, the VMM runs directly on the hardware platform 100 at the same system level as the host OS. In such case, the VMM may use the host OS to perform certain functions, including I/O, by calling (usually through a host API—application program interface) the host drivers 240. In this situation, it is still possible to view the VMM as an additional software layer inserted between the hardware 100 and the guest OS 320. Furthermore, it may in some cases be beneficial to deploy VMMs on top of a thin software layer, a “kernel,” constructed specifically for this purpose.
FIG. 2 illustrates yet another implementation, in which the kernel 700 takes the place of and performs the conventional functions of the host OS. Compared with a system in which VMMs run directly on the hardware platform, use of a kernel offers greater modularity and facilitates provision of services that extend across multiple virtual machines (for example, resource management). Compared with the hosted deployment, a kernel may offer greater performance because it can be co-developed with the VMM and be optimized for the characteristics of a workload consisting of VMMs.
As used herein, the “host” OS therefore means either the native OS 220 of the underlying physical computer, or whatever system-level software handles actual I/O operations, takes faults and interrupts, etc. for the VM. The invention may be used in all the different configurations described above.
Speed is a critical issue in virtualization—a VM that perfectly emulates the functions of a given computer but that is too slow to perform needed tasks is obviously of little good to a user. Ideally, a VM should operate at the native speed of the underlying host system. In practice, even where only a single VM is installed on the host, it is impossible to run a VM at native speed, if for no other reason than that the instructions that define the VMM must also be executed. Near native speed, is possible, however, in many common applications.
The highest speed for a VM is found in the special case where every VM instruction executes directly on the hardware processor. This would in general not be a good idea, however, because the VM should not be allowed to operate at the greatest privilege level; otherwise, it might alter the instructions or data of the host OS or the VMM itself and cause unpredictable behavior. Moreover, in cross-architectural systems, one or more instructions issued by the VM may not be included in the instruction set of the host processor. Instructions that cannot (or must not) execute directly on the host are typically converted into an instruction stream that can. This conversion process is commonly known as “binary translation.”
U.S. Pat. No. 6,397,242 (Devine, et al., “Virtualization system including a virtual machine monitor for a computer with a segmented architecture”), which is incorporated herein by reference, describes a system in which the VMM includes a mechanism that allows VM instructions to execute directly on the hardware platform whenever possible, but that switches to binary translation when necessary. This allows for the speed of direct execution combined with the security of binary translation.
A virtualization system of course involves more than executing VM instructions—the VMM itself is also a software mechanism defined by instructions and data of its own. For example, the VMM might be a program written in C, compiled to execute on the system hardware platform. At the same time, an application 360 written in a language such as Visual Basic might be running in the VM, whose guest OS may be compiled from a different language.
There must also be some way for the VM to access hardware devices, albeit in a manner transparent to the VM itself. One solution would of course be to include in the VMM all the required drivers and functionality normally found in the host OS 220 to accomplish I/O tasks. Two disadvantages of this solution are increased VMM complexity and duplicated effort—if a new device is added, then its driver would need to be loaded into both the host OS and the VMM. In systems that include a host OS (as opposed to a dedicated kernel such as shown in FIG. 2), a much more efficient method has been implemented by VMware, Inc., in its Workstation product. This method is also illustrated in FIG. 1.
In the system illustrated in FIG. 1, both the host OS and the VMM are installed at system level, meaning that they both run at the greatest privilege level and can therefore independently modify the state of the hardware processor(s). For I/O to at least some devices, however, the VMM may issue requests via the host OS 220. To make this possible, a special driver VMdrv 242 is installed as any other driver within the host OS 220 and exposes a standard API to a user-level application VMapp 500. When the system is in the VMM context, meaning that the VMM is taking exceptions, handling interrupts, etc., but the VMM wishes to use the existing I/O facilities of the host OS, the VMM calls the driver VMdrv 242, which then issues calls to the application VMapp 500, which then carries out the I/O request by calling the appropriate routine in the host OS.
In FIG. 1, the vertical line 600 symbolizes the boundary between the virtualized (VM/VMM) and non-virtualized (host software) “worlds” or “contexts.” The driver VMdrv 242 and application VMapp 500 thus enable communication between the worlds even though the virtualized world is essentially transparent to the host system software 200.
Execution Modes: Direct Execution, Binary Translation and Interpretation
As described above, the VM 300 is a software implementation of a complete computer system. The guest system software 312, including the guest OS 320, and the applications 360 may be loaded onto the VM 300 and executed, just as if the VM 300 were a “real”, physical computer system. In supporting the VM 300, the VMM 400 must enable the execution of VM instructions or emulate the execution of VM instructions (also referred to as guest instructions), including instructions from the guest system software 312 and the applications 360. The methods by which guest instructions are executed or emulated are referred to as execution modes of the VMM 400. The system described herein includes three distinct execution modes.
One mode—the direct execution mode—is described above and is well known in the art of virtualization: In the direct execution mode, VM instructions are executed directly on the host hardware processor, since they cannot affect any sub-system or access any memory, register, etc., that is off-limits to the VM. Direct execution is fast, because there is no need for intermediate processing in the VMM. The VMM therefore includes any known direct execution engine 460 to perform this function.
Binary translation is also mentioned above: A VM instruction (or instruction stream) cannot be allowed to execute as is for any of several reasons, for example, it attempts to access the VMM, or assumes a privilege level higher than the user level the VM runs at. Although possible, the VMM preferably detects the need for binary translation not by examining each VM instruction before it is to be executed, but rather by detecting exceptions, interrupts, etc., that arise from attempted execution of instructions that cannot be directly executed. The VMM itself establishes many of the mechanisms used to generate these exceptions, for example, using memory tracing; other interrupts will be raised by the underlying system software.
In the binary translation mode, a binary translation engine 462 in the VMM checks a translation cache 463 to determine whether there is an existing translation of the instruction (or instruction stream) into a form that is “safe” to pass to the hardware processor for execution. If there is such a translation, then the translation is executed. If there is not yet a translation, for example, the first time the instruction is encountered, the first time it traps, etc., then the binary translation engine generates one.
Alternatively, the VM instruction (or stream) under consideration can be passed to a conventional interpreter 464, which emulates the execution of the VM instruction(s) in software. Note that interpretation is usually cheaper than binary translation in terms of processing cycles required, but that binary translation will be much faster if a translation already exists in the cache 463—the translation can be used more than once.
CPU Registers and Binary Translation
This invention relates to the use of registers within the CPU 102 during the binary translation mode. As described above, binary translation involves translating one or more guest instructions into one or more target instructions that are safe for execution on the system hardware 100. The target instructions will generally use many of the same registers as the guest instructions. For example, if a guest instruction loads a value from a memory location addressed by the contents of a first register R1 into a second register R2, the target instructions resulting from a binary translation will typically use the same registers for the same purposes. The target instructions will typically expect to find the address for the operand in the register R1 and will typically be expected to load the operand into the register R2. The target instructions may also use additional registers, which are not used by the guest instructions, for intermediate values and for other purposes. These registers are referred to as scratch registers. Typically, a scratch register is used for a short period of time for temporary storage of an intermediate value. In some embodiments of the invention, a scratch register is used only within a single block of translated instructions, so that there will be no dependence on the value in the scratch register outside of that block of translated instructions.
As an example of the use of scratch registers, suppose the guest software includes the following instruction:                (qp) mov r1=cr3The VMM 400 emulates the function of the control register cr3, using a location in memory to store the current value of the register, as seen within the VM 300. Thus, the VMM writes a value to the selected memory location when the register cr3 is to be loaded with a value and reads a value from the selected memory location when the register cr3 is to be read. Suppose further that a base address for the VMM has already been loaded into a scratch register sr0. The base address may be retained in the register sr0 throughout the execution of the block of translated code. Under these circumstances, the above instruction may be translated into the following sequence of instructions:        movl sr1=addr;;        add sr1=sr0,sr1 ;;        ld8 r1=[sr1]The variable “addr” is the virtual address for the memory location that holds the value of the virtual guest register cr3, or the offset from the VMM base address to the virtual guest register cr3. The registers sr0 and sr1 are used as scratch registers. The contents of the scratch register sr1 typically would not be used outside this sequence of instructions.        
A binary translator must be careful in its use of scratch registers, however. In the virtual computing system described herein, the VMM 400 switches back and forth between direct execution mode and binary translation mode. Suppose the VMM executes a first set of guest instructions in direct execution mode. Suppose next that an exception occurs that causes the VMM to switch to binary translation mode, just before a second set of guest instructions was to be executed. Suppose that the second set of guest instructions is translated, using binary translation, into a first set of target instructions, so that the first set of target instructions is executed next. Suppose that, after executing the first set of target instructions, the VMM switches back to direct execution mode for execution of a third set of guest instructions.
Now suppose that the first set of guest instructions writes a first value into the first register R1. Suppose further that an instruction in the third set of guest instructions relies on the first value being loaded into the first register when the third set of guest instructions is executed. Now, when the binary translator generates the first set of target instructions, based on the second set of guest instructions, the translator must ensure that the first register contains the first value after execution of the target instructions. Suppose that the translator needs to use a scratch register for some purpose within the second set of target instructions. If the translator were to use the register R1, without any remedial actions, the contents of the register R1 would typically be overwritten when the target instructions are executed, and the first value would not be available to the third set of guest instructions.
Of course, the translator could choose to use a different register, instead of the register R1. This implies, however, that the translator can determine another register that is available for use as a scratch register. First, there may not be a register that is available for use, and, even if there is a register available for use, it may not be easy for the translator to determine which register(s) are currently available.
Another possible remedial action would be to add instructions to the set of target instructions that save the contents of the first register R1 to memory before using the register as a scratch register, and return the first value from memory to the first register R1 before returning to direct execution mode for execution of the third set of guest instructions.
In some architectures, however, saving the first register R1 to memory in the first place may be problematic. Some architectures permit only indirect memory addressing, instead of allowing immediate addresses to be used to specify a memory location. This means that a second register must be used to save the first register R1 to memory. Now, the same dilemma arises as to which register, if any, can be used for the indirect memory addressing.
Also, as indicated above, speed is a critical issue in virtualization. Of course, it takes time to store register values into memory before executing target instructions that emulate the corresponding guest instructions, and restoring the values into the registers after the target instructions have been executed. In some situations, the string of target instructions required to emulate a set of guest instructions may be small relative to the instructions required to save and restore register values. In addition, such target instructions may be executed frequently. Consequently, saving and restoring register values during the binary translation mode may be relatively expensive in terms of execution time. Also, methods for determining which registers are available for use as scratch registers can also differ substantially in terms of consuming processing resources.
What is needed therefore is an efficient method and apparatus for managing registers in a binary translator, enabling one or more of the registers to be used as scratch registers by target instructions, while maintaining the integrity of the contents of each of the registers during and after the execution of the target instructions. Such a method and apparatus is needed, in particular, for an architecture that permits only indirect memory addressing. This invention provides such a method and apparatus.