1. Field of the Invention
This invention relates to virtualized computer systems, and, in particular, to a system and method for executing multicomponent software applications on a virtualized computer platform.
2. Description of the Related Art
The invention may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media. The computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer. For example, the computer readable media may comprise one or more CDs (Compact Discs), one or more DVDs (Digital Versatile Discs), some form of flash memory device, a computer hard disk and/or some form of internal computer memory, to name just a few examples. An embodiment of the invention, in which one or more computer program modules is embodied in one or more computer readable media, may be made by writing the computer program modules to any combination of one or more computer readable media. Such an embodiment of the invention may be sold by enabling a customer to obtain a copy of the computer program modules in one or more computer readable media, regardless of the manner in which the customer obtains the copy of the computer program modules. Thus, for example, a computer program implementing the invention may be purchased electronically over the Internet and downloaded directly from a vendor's web server to the purchaser's computer, without any transference of any computer readable media. In such a case, writing the computer program to a hard disk of the web server to make it available over the Internet may be considered a making of the invention on the part of the vendor, and the purchase and download of the computer program by a customer may be considered a sale of the invention by the vendor, as well as a making of the invention by the customer.
The invention generally relates to providing a virtualized computer platform for the execution of software applications that comprise multiple software components that are generally executed concurrently. For example, the virtualized computer platform may be advantageously used for the execution of distributed applications and/or multitier applications. For purposes of this patent, a conventional software application that comprises multiple software modules that are linked together to form a single program, so that generally only one software module is executing at a time, does not constitute a multicomponent software application (or a software application comprising multiple software components). Conversely, for purposes of this patent, a “multicomponent software application” means a collection of multiple software components, a plurality of which is generally executed concurrently, in a coordinated manner. In particular, for purposes of this patent, a multicomponent software application means a distributed application, a multitier application, or a substantially similar software application comprising multiple software components. However, in some embodiments of the invention, the virtualized computer platform can also be used for the execution of conventional software applications that do not comprise multiple software components. A preferred embodiment of the invention may be derived from existing virtualization products of the assignee of this patent, VMware, Inc. Consequently, the general architectures of two types of products of VMware, a “hosted” virtual computer system and a “kernel-based” virtual computer system, are described below to provide background for the detailed description of the invention. The invention may also be implemented in a wide variety of other virtualized computer systems, however.
Hosted Virtual Computer System
FIG. 1 illustrates the main components of a “hosted” virtual computer system 100A as generally implemented in the Workstation virtualization product of VMware, Inc. The virtual computer system 100A supports a virtual machine (VM) 300A. As is well known in the field of computer science, a VM is a software abstraction or a “virtualization,” often of an actual physical computer system. As in conventional computer systems, both system hardware 102 and system software 150 are included. The system hardware 102 includes one or more processors (CPUs) 104, which may be a single processor, or two or more cooperating processors in a known multiprocessor arrangement. The system hardware also includes system memory 108, one or more disks 110, and some form of memory management unit (MMU) 106. The system memory is typically some form of high-speed RAM (random access memory), whereas the disk is typically a non-volatile, mass storage device. As is well understood in the field of computer engineering, the system hardware also includes, or is connected to, conventional registers, interrupt-handling circuitry, a clock, etc., which, for the sake of simplicity, are not shown in the figure.
The system software 150 typically either is or at least includes an operating system (OS) 152, which has drivers 154 as needed for controlling and communicating with various devices 112, and usually with the disk 110 as well. Conventional applications 160 (APPS), if included, may be installed to run on the hardware 102 via the system software 150 and any drivers needed to enable communication with devices.
The VM 300A—also known as a “virtual computer”—is often a software implementation of a complete computer system. In the VM, the physical system components of a “real” computer are emulated in software, that is, they are virtualized. Thus, the VM 300A will typically include virtualized (“guest”) system hardware 302, which in turn includes one or more virtual CPUs 304 (VCPU), virtual system memory 308 (VMEM), one or more virtual disks 310 (VDISK), and one or more virtual devices 312 (VDEVICE), all of which are implemented in software to emulate the corresponding components of an actual computer. The concept, design and operation of virtual machines are well known in the field of computer science.
The VM 300A also has system software 350, which may include a guest OS 352, as well as drivers 354 as needed, for example, to control the virtual device(s) 312. The guest OS 352 may, but need not, simply be a copy of a conventional, commodity OS. Of course, most computers are intended to run various applications, and a VM is usually no exception. Consequently, by way of example, FIG. 1 illustrates one or more applications 360 (APPS) installed to run on the guest OS 352; any number of applications, including none at all, may be loaded for running on the guest OS, limited only by the requirements of the VM. Software running in the VM 300A, including the guest OS 352 and the guest applications 360, is generally referred to as “guest software.”
Note that although the virtual hardware “layer” 302 is a software abstraction of physical components, the VM's system software 350 may be the same as would be loaded into a hardware computer. The modifier “guest” is used here to indicate that the VM, although it acts as a “real” computer from the perspective of a user, is actually just computer code that is executed on the underlying “host” hardware and software platform 102, 150. Thus, for example, I/O to a virtual device 312 will actually be carried out by I/O to a corresponding hardware device 112, but in a manner transparent to the VM.
Some interface is usually required between the VM 300A and the underlying “host” hardware 102, which is responsible for actually executing VM-related instructions and transferring data to and from the actual physical memory 108, the processor(s) 104, the disk(s) 110 and the other device(s) 112. One advantageous interface between the VM and the underlying host system is often referred to as a virtual machine monitor (VMM), also known as a virtual machine “manager.” Virtual machine monitors have a long history, dating back to mainframe computer systems in the 1960s. See, for example, Robert P. Goldberg, “Survey of Virtual Machine Research,” IEEE Computer, June 1974, p. 34-45.
A VMM is usually a relatively thin layer of software that runs directly on top of host software, such as the system software 150, or directly on the hardware, and virtualizes the resources of the (or some) hardware platform. FIG. 1 shows virtualization software 200A running directly on the system hardware 102. The virtualization software 200A may be a VMM, for example. Thus, the virtualization software 200A is also referred to herein as a VMM 200A. The VMM 200A will typically include at least one device emulator 252A, which may also form the implementation of the virtual device 312. The VMM 200A may also include a memory manager 254A that maps memory addresses used within the VM 300A (for the virtual memory 308) to appropriate memory addresses that can be applied to the physical memory 108. The VMM also usually tracks and either forwards (to the host OS 152) or itself schedules and handles all requests by its VM for machine resources, as well as various faults and interrupts. FIG. 1 therefore illustrates an interrupt (including fault) handler 256A within the VMM. The general features of VMMs are well known and are therefore not discussed in further detail here.
FIG. 1 illustrates a single VM 300A merely for the sake of simplicity; in many installations, there will be more than one VM installed to run on the common hardware platform; all may have essentially the same general structure, although the individual components need not be identical. Also in FIG. 1, a single VMM 200A is shown acting as the interface for the single VM 300A. It would also be possible to include the VMM as part of its respective VM, that is, in each virtual system. Although the VMM is usually completely transparent to the VM, the VM and VMM may be viewed as a single module that virtualizes a computer system. The VM and VMM are shown as separate software entities in the figures for the sake of clarity. Moreover, it would also be possible to use a single VMM to act as the interface for more than one VM, although it will in many cases be more difficult to switch between the different contexts of the various VMs (for example, if different VMs use different guest operating systems) than it is simply to include a separate VMM for each VM. This invention works with all such VM/VMM configurations.
In all of these configurations, there must be some way for the VM to access hardware devices, albeit in a manner transparent to the VM itself. One solution would of course be to include in the VMM all the required drivers and functionality normally found in the host OS 152 to accomplish I/O tasks. Two disadvantages of this solution are increased VMM complexity and duplicated effort—if a new device is added, then its driver would need to be loaded into both the host OS and the VMM. A third disadvantage is that the use of a hardware device by a VMM driver may confuse the host OS, which typically would expect that only the host's driver would access the hardware device. A different method for enabling the VM to access hardware devices has been implemented by VMware, Inc., in its Workstation virtualization product. This method is also illustrated in FIG. 1.
In the system illustrated in FIG. 1, both the host OS 152 and the VMM 200A are installed at system level, meaning that they both run at the greatest privilege level and can therefore independently modify the state of the hardware processor(s). For I/O to at least some devices, however, the VMM may issue requests via the host OS. To make this possible, a special driver VMdrv 258 is installed as any other driver within the host OS 152 and exposes a standard API (Application Program Interface) to a user-level application VMapp 260. When the system is in the VMM context, meaning that the VMM is taking exceptions, handling interrupts, etc., but the VMM wishes to use the existing I/O facilities of the host OS, the VMM calls the driver VMdrv 258, which then issues calls to the application VMapp 260, which then carries out the I/O request by calling the appropriate routine in the host OS.
In FIG. 1, a vertical line 230 symbolizes the boundary between the virtualized (VM/VMM) and non-virtualized (host software) “worlds” or “contexts.” The driver VMdrv 258 and application VMapp 260 thus enable communication between the worlds even though the virtualized world is essentially transparent to the host system software 150.
In some cases, however, it may be beneficial to deploy VMMs on top of a thin software layer, a “kernel,” constructed specifically for this purpose. FIG. 2 illustrates an implementation in which a kernel 202B takes the place of and performs the conventional functions of the host OS, including handling actual I/O operations. The kernel-based virtual computer system of FIG. 2 is described in greater detail below. Compared with a system in which VMMs run directly on the hardware platform, use of a kernel offers greater modularity and facilitates provision of services that extend across multiple virtual machines (for example, resource management). Also, compared with the hosted deployment, a kernel may offer greater performance because it can be co-developed with the VMM and be optimized for the characteristics of a workload consisting of VMMs.
As used herein, the “host” OS therefore means either the native OS 152 of the underlying physical computer, a specially constructed kernel 202B as described below, or whatever other system-level software handles actual I/O operations, takes interrupts, etc. for the VM. The invention may be used in all the different configurations mentioned above.
Kernel-Based Virtual Computer System
FIG. 2 illustrates the main components of a “kernel-based” virtual computer system 100B as generally implemented in the ESX Server virtualization product of VMware, Inc. A kernel-based virtualization system of the type illustrated in FIG. 2 is described in U.S. patent application Ser. No. 09/877,378 (“Computer Configuration for Resource Management in Systems Including a Virtual Machine”), which is incorporated here by reference. The main components of this system and aspects of their interaction are, however, outlined below.
The virtual computer system 100B includes one or more VMs, such as a first VM 300B and a second VM 300C. Each VM is installed as a “guest” on a “host” hardware platform, which, as shown in FIG. 2, may be the same as the hardware platform 102 of the virtual computer system 100A of FIG. 1. Thus, FIG. 2 shows the hardware platform 102 as including the one or more processors (CPUs) 104, the system memory 108, one or more disks 110, the MMU 106, and the device(s) 112.
Each VM 300B, 300C may include the same virtualized (“guest”) system hardware 302 as the VM 300A of FIG. 1. Thus, FIG. 2 shows the VM 300B as including the virtual system hardware 302, including the one or more virtual CPUs 304 (VCPU), the virtual system memory 308 (VMEM), the one or more virtual disks 310 (VDISK), and the one or more virtual devices 312 (VDEVICE). Each VM 300B, 300C may also include the guest OS 352, the drivers 354 and the one or more applications 360 (APPS) of the VM 300A of FIG. 1, as shown in FIG. 2 for the VM 300B.
Also as shown in FIG. 2, the virtual computer system 100B includes virtualization software 200B, which includes a VMM 250B that supports the VM 300B and a VMM 250C that supports the VM 300C. The VMMs 250B and 250C may be substantially the same as the virtualization software (VMM) 200A shown in FIG. 1. Thus, FIG. 2 shows the VMM 250B as including one or more device emulators 252B, which may be substantially the same as the device emulators 252A, a memory manager 254B, which may be substantially the same as the memory manager 254A, and an interrupt handler 256B, which may be substantially the same as the interrupt handler 256A.
The device emulators 252B emulate system resources for use within the VM 300B. These device emulators will then typically also handle any necessary conversions between the resources as exported to the VM and the actual physical resources. One advantage of such an arrangement is that the VMM 250B may be set up to expose “generic” devices, which facilitates VM migration and hardware platform-independence. For example, the VMM may be set up with a device emulator 252B that emulates a standard Small Computer System Interface (SCSI) disk, so that the virtual disk 310 appears within the VM 300B to be a standard SCSI disk connected to a standard SCSI adapter, whereas the underlying, actual, physical disk 110 may be something else. In this case, a standard SCSI driver is installed into the guest OS 352 as one of the drivers 354. The device emulator 252B then interfaces with the driver 354 and handles disk operations for the VM 300B. The device emulator 252B then converts the disk operations from the VM 300B to corresponding disk operations for the physical disk 110.
When the computer system 100B of FIG. 2 is booted up, an existing operating system 152, which may be the same as the host OS 152 of FIG. 1, may be at system level and the kernel 202B may not yet even be operational within the system. In such case, one of the functions of the OS 152 may be to make it possible to load the kernel 202B, after which the kernel runs on the native hardware 102 and manages system resources. In effect, the kernel, once loaded, displaces the OS 152. Thus, the kernel 202B may be viewed either as displacing the OS 152 from the system level and taking this place itself, or as residing at a “sub-system level.” When interposed between the OS 152 and the hardware 102, the kernel 202B essentially turns the OS 152 into an “application,” which has access to system resources only when allowed by the kernel 202B. The kernel then schedules the OS 152 as if it were any other component that needs to use system resources.
The OS 152 may also be included to allow applications unrelated to virtualization to run; for example, a system administrator may need such applications to monitor the hardware 102 or to perform other administrative routines. The OS 152 may thus be viewed as a “console” OS (COS). In such implementations, the kernel 202B preferably also includes a remote procedure call (RPC) mechanism to enable communication between, for example, the VMMs 250B, 250C and any applications 160 (APPS), which may be the same as the applications 160 of FIG. 1, installed to run on the COS 152.
The kernel 202B handles not only the various VMM/VMs, but also any other applications running on the kernel, as well as the COS 152 and even the hardware CPU(s) 104, as entities that can be separately scheduled. In this disclosure, each schedulable entity is referred to as a “world,” which contains a thread of control, an address space, machine memory, and handles to the various device objects that it is accessing. Worlds are stored in a portion of the memory space controlled by the kernel. More specifically, the worlds are controlled by a world manager, represented in FIG. 2 within the kernel 202B as module 206B. Each world also has its own task structure, and usually also a data structure for storing the hardware state currently associated with the respective world.
There will usually be different types of worlds: 1) system worlds, which are used for idle worlds, one per CPU, and a helper world that performs tasks that need to be done asynchronously; 2) a console world, which is a special world that runs in the kernel and is associated with the COS 152; and 3) virtual machine worlds.
The kernel 202B includes a memory management module 204B that manages all machine memory that is not allocated exclusively to the COS 152. When the kernel 202B is loaded, the information about the maximum amount of memory available on the machine is available to the kernel, as well as information about how much of it is being used by the COS. Part of the machine memory is used for the kernel 202B itself and the rest is used for the virtual machine worlds.
Virtual machine worlds use machine memory for two purposes. First, memory is used to back portions of each world's memory region, that is, to store code, data, stacks, etc. For example, the code and data for the VMM 250B is backed by machine memory allocated by the kernel 202B. Second, memory is used for the guest memory of the virtual machine. The memory management module may include any of a variety of algorithms for dynamically allocating memory among the different VM's 300B, 300C.
The kernel 202B preferably also includes an interrupt/exception handler 208B that is able to intercept and handle interrupts and exceptions for all devices on the machine. However, when a VMM world is running, the VMM's Interrupt Descriptor Table (IDT) is loaded, such that the VMM will handle all interrupts and exceptions.
The VMM will handle some interrupts and exceptions completely on its own. For other interrupts/exceptions, it will be either necessary or at least more efficient for the VMM to call the kernel to have the kernel either handle the interrupts/exceptions itself, or to forward them to some other sub-system such as the COS. The VMM may forward still other interrupts to the corresponding VM.
In some embodiments of the invention, the kernel 202B is responsible for providing access to all devices on the physical machine. In addition to other modules that the designer may choose to load onto the system for access by the kernel, the kernel will therefore typically load conventional drivers as needed to control access to devices. Accordingly, FIG. 2 shows a module 210B containing loadable kernel modules and drivers. The kernel 202B may interface with the loadable modules and drivers in a conventional manner, using an API or similar interface.
Multicomponent Software Applications
Multitier applications and distributed applications are two different types of multicomponent software applications. Other types of multicomponent software applications are also possible. Existing multicomponent software applications generally comprise multiple software components that are typically executed on separate physical computers.
Thus, for example, suppose that a company wants to run a multitier application comprising three software components, namely a database software component, a financial software component and a user-interface software component. Suppose further that the company purchases three server computers for running the multitier application, one for each of the software components. As is well known, installing and configuring multicomponent applications is often quite complex and time consuming. The IT (Information Technology) department of the company must first install an OS on each of the servers, bring each OS up to the right patch level, and possibly harden each system to guard against security attacks. The IT department can then install each component onto its respective server, and then configure each component. The configuration process is typically complicated by the need for the multiple components to communicate and interact with one another. Thus, each server/component must be configured not only with its own communication settings, such as IP addresses, etc., but each server/component must also be configured with the communication settings of the other server/components with which it must communicate.
Now, with such a configuration, one or more of the servers may be underutilized. In reality, all three servers are typically underutilized because surplus computing resources are typically provided to enable the computing system to handle variations in workloads. Thus, installations of multicomponent software applications are typically inefficient in their utilization of computing resources.
Now, suppose that one of the three server computers fails, such that the software component running on the failed server can no longer operate effectively. Often in such a situation, the operation of the entire multicomponent software application is disrupted until the failed server can be repaired or replaced. Then, the newly repaired server often must be reconfigured, and even the other two servers may need to be reconfigured, depending on what needed to be done with the failed server.
Now, suppose that the workload for one of the software components increases to the point that the computing resources of the component's server are inadequate to keep up with the demands. For example, suppose that the workload of the financial software component is substantially increased during one or more periods of a fiscal year, which is often the case. The IT department of the company will generally need to take some action to increase the computing resources available to the server running the financial software component, such as adding memory to the overloaded server computer or possibly adding an additional server computer to provide additional processing capabilities. In the case of adding an additional server, a second instance of the financial software component may be installed and configured on the new server computer (after an OS is loaded and patched, and possibly after the system is hardened). All of the servers and software components will typically need to be reconfigured to operate in the new four-server configuration.
In any of these scenarios, and in numerous other scenarios, the maintenance of multicomponent software applications is also quite complex and time consuming. Providing other services for multicomponent software applications, such as maintaining a backup of data, can also be more complex and time consuming than for conventional software applications. Overall, the installation, configuration and ongoing operation of multicomponent software applications can be quite complex and time consuming, and it can be inefficient in its use of hardware resources and the personnel resources of an IT department. What is needed therefore is an improved method and system for executing multicomponent software applications.