Virtualization
As is well known in the field of computer science, a virtual machine is an abstraction—a “virtualization”—of an actual physical computer system. FIGS. 1A and 1B show two possible arrangements of virtualization software in a computer system 70 that implements virtualization. A virtual machine or “guest” 20 is installed on a “host platform,” or simply “host,” which will include system hardware 10 and one or more layers or co-resident components comprising system-level software, such as an operating system or similar kernel, or a virtual machine monitor or hypervisor as described in more detail below, or some combination of these. The system hardware typically includes one or more processors 11, memory 13, some form of mass storage 14, and various other devices 17.
Each VM 20, . . . , 20-n will typically have both virtual system hardware 28 and guest system software 29. The virtual system hardware typically includes at least one virtual CPU 21-0-21-m, virtual memory 23, at least one virtual disk 24, and one or more virtual devices 27. Note that a disk—virtual or physical—is also a “device,” but is often considered separately because of the important role of the disk. All of the virtual hardware components of the VM may be implemented in software using known techniques to emulate the corresponding physical components. The guest system software includes a guest operating system (OS) 22 and drivers 25 as needed for the various virtual devices 27.
A single VM may be configured with more than one virtualized processor. To permit computer systems to scale to larger numbers of concurrent threads, systems with multiple CPUs have been developed. These symmetric multi-processor (SMP) systems are available as extensions of the PC platform. Essentially, an SMP system is a hardware platform that connects multiple processors to a shared main memory and shared I/O devices. Virtual machines may also be configured as SMP VMs. FIGS. 1A and 1B, for example, illustrate multiple virtual processors 21-0, 21-1, . . . , 21-m (VCPU0, VCPU1, . . . , VCPUm) within the VM 20.
Yet another configuration is found in a so-called “multi-core” architecture, in which more than one physical CPU is fabricated on a single chip, each core having its own set of functional units (such as registers, L2 caches, arithmetic/logic units (ALUs), etc.) and can execute threads independently. Multi-core processors typically share certain resources, such as L2 and/or L3 caches. Still another technique that provides for simultaneous execution of multiple threads is referred to as “simultaneous multi-threading,” in which more than one hardware thread operates simultaneously on a single processing core.
Each guest VM executes on system hardware 10 and physical CPU(s) 11 in its own “context,” which is provided by an underlying virtualization software layer. A “context” generally includes the state of all virtual address space, as well as the set of registers (including privilege registers), with all hardware exception and entry points. Thus, although they share system resources, each guest VM is isolated from one another and from the underlying virtualization software. Furthermore, if the virtualization system is properly designed, applications 26 running on each VM will function as they would if run directly on a physical computer, even though the applications are running at least partially indirectly on virtual system hardware 28. Executable files will be accessed by guest OS 22 from the virtual disk 24 or virtual memory 23, which are mapped to portions of the actual physical disk 14 or memory 13, respectively, which portions are allocated to that VM by the virtualization software layer. The design and operation of virtual machines are well known in the field of computer science.
The virtualization software layer, also referred to herein as “virtualization layer” or “virtualization software,” may include one or more software components and/or layers, possibly including one or more of the software components known in the field of virtual machine technology as “virtual machine monitors” (VMMs), “hypervisors,” “host operating systems,” or virtualization “kernels.” Because terminology related to virtualization has evolved over time and has not yet become fully standardized, these terms do not always provide clear distinctions between the software layers and components to which they refer. For example, the term, “hypervisor” is often used to describe both a VMM and a kernel together, either as separate but cooperating components or with one or more VMMs incorporated wholly or partially into the kernel itself. However, “hypervisor” is sometimes used instead to mean some variant of a VMM alone, which interfaces with some other software layer(s) or component(s) to support the virtualization. For example, in some systems, some virtualization code is included in at least one “superior” VM or host operating system to facilitate the virtualization.
Some software components are shown and described as being within a “virtualization layer” located logically between all virtual machines and the underlying hardware platform and/or system-level host software. This virtualization layer can be considered part of the overall virtualization software, although it would be possible to implement at least part of this layer in specialized hardware. FIGS. 1A and 1B show one or more virtual machine monitors that appear as separate entities from other components of the virtualization software and perform certain functions relating to the invention. Those skilled in the art may recognize that such a representation of these components is provided only for the sake of simplicity and clarity and by way of illustration. As mentioned above, the distinctions between and among the various components of a virtualization system are not always so clear-cut, and the use of the term “virtual machine monitor” or just “VMM” is meant to encompass the component(s) in the virtualization software that perform the indicated functions, regardless of what name they are given.
The various virtualized hardware components of virtual system hardware 28, such as virtual CPU(s) 21-21m, virtual memory 23, virtual disk 24, and virtual device(s) 27, are shown as being part of VM 20 for the sake of conceptual simplicity. In reality, these “components” are merely projections of virtual devices that are visible to guest operating system 22, but are actually usually implemented by device emulators 33 included in the VMM.
Different systems may implement virtualization to different degrees—“virtualization” generally relates to a spectrum of definitions rather than to a bright line. A particular implementation often reflects a design choice with respect to a trade-off between speed and efficiency on the one hand and isolation and universality on the other hand. For example, “full virtualization” is sometimes used to denote a system in which no software components of any form are included in the guest other than those that would be found in a non-virtualized computer; thus, the guest OS could be an off-the-shelf, commercially available OS with no components included specifically to support use in a virtualized environment.
In contrast, another concept, which has yet to achieve a universally accepted definition, is that of “para-virtualization.” As the name implies, a “para-virtualized” system is configured in some way to provide certain features that facilitate virtualization. For example, the guest operating system in some para-virtualized systems is designed to avoid hard-to-virtualize operations and configurations. For example, the guest operating system may be written so that it avoids certain privileged instructions, certain memory address ranges, etc. As another example, many para-virtualized systems include an interface within the guest that enables explicit calls to other components of the virtualization software.
In addition to the sometimes fuzzy distinction between full and partial (para-) virtualization, two arrangements of intermediate system-level software layer(s) are in general use: a “non-hosted” configuration, such as that shown in FIG. 1A, and a “hosted” configuration, such as that shown in FIG. 1B. The non-hosted configuration illustrated in FIG. 1A, deploys one or more VMMs 30-30n on top virtualization kernel 60. Kernel 60 is constructed specifically to provide efficient support for VMMs 30-30n. In particular, kernel 60 includes device drivers to manage and control physical system hardware 10, and to assign and distribute resources to VMMs 30-30n. A console operating system 42 and associated applications 43 may be provided to provide a user interface to allow a user (e.g., an administrator) control over the operation of kernel 60 as well as to interact with applications executing on each of the virtual machines.
In the hosted configuration shown in FIG. 1B, VMMs 30-30n run directly on the hardware platform along with host operating system 50. In a hosted virtualized computer system, an existing, general-purpose operating system forms “host” operating system 50 that is used to perform certain input/output (I/O) operations, alongside and sometimes at the request of the VMM. In this configuration, host operating system 50 includes driver 58 and one or more executable applications 56 that serve a number of virtualization functions, including provide an interface between VMMs 30-30n and physical devices, manage and distribute system resources, and provide user interfaces to virtualization system and the inputs and outputs to each of the virtual machines. Host operating system 50, installed drivers 54, VM applications 56, along with other user applications 43 form host system software 52. The Workstation product of VMware, Inc., of Palo Alto, Calif., is an example of a hosted, virtualized computer system, which is also explained in U.S. Pat. No. 6,496,847 (Bugnion, et al., entitled “System and Method for Virtualizing Computer Systems”). Thus, the term “host” in this particular context refers to the host operating system that is used to support a virtual machine, whereas, generally speaking, it refers to the physical host platform on which the virtual machine resides.
With respect to terminology, it should be noted that kernel 60 shown in the non-hosted system in FIG. 1A is not the same as the operating system kernel within the guest operating system 22. As is well known, every operating system has its own kernel. Note also that kernel 60 is part of the “host” platform of the VM/VMM as defined above even though the configuration shown in FIG. 1A is commonly termed “non-hosted.” Kernel 60 may be considered to be both part of the host platform and part of the virtualization software. The difference in terminology is one of perspective and definitions that are still evolving in the art of virtualization.
Regardless as to whether the system is configured as a hosted virtualization system or a non-hosted virtualization system, the address space of system memory 13 is generally partitioned into pages, regions, or other analogous allocation units. Applications address the memory using virtual addresses (VAs), each of which typically comprises a virtual page number (VPN) and an offset into the indicated page. The VAs are then mapped to physical addresses (PAs), each of which similarly comprises a physical page number (PPN) and an offset, and which is actually used to address physical system memory 13. The same offset is usually used in both a VA and its corresponding PA, so that only the VPN needs to be converted into a corresponding PPN. The concepts of VPNs and PPNs, as well as the way in which the different page numbering schemes are implemented and used, are described in many standard texts, such as “Computer Organization and Design: The Hardware/Software Interface,” by David A. Patterson and John L. Hennessy, Morgan Kaufmann Publishers, Inc., San Francisco, Calif., 1994, pp. 579-603 (chapter 7.4 “Virtual Memory”). Similar mappings are used in region-based architectures or, indeed, in any architecture where relocatability is possible.
An extra level of addressing indirection is typically implemented in virtualized systems in that a VPN issued by an application running in a VM is remapped twice in order to determine which page of system memory 13 is intended. The first mapping is provided by guest operating system 22, which translates the guest VPN (GVPN) into a corresponding guest PPN (GPPN) in the conventional manner. In a manner of speaking, the guest OS therefore “believes” that it is directly addressing the actual hardware memory, but in fact it is not. A memory management module, located typically somewhere in the virtualization software (such as in the VMM) performs the second mapping by taking the GPPN issued by the guest OS and mapping it to a hardware (or “machine”) physical page number PPN that can be used to address physical system memory 13. This GPPN-to-PPN mapping may instead be done in the main system-level software layer, depending on the implementation. From the perspective of guest operating system 22, the GVPN and GPPN might be virtual and physical page numbers just as they would be if the guest operating system were the only operating system in the system. From the perspective of the system software, i.e., the virtualization layer, the GPPN is a page number that is then mapped into the physical memory space of the hardware memory as a PPN.
The addressable space of the disk(s), and therefore also of the virtual disk(s), is similarly subdivided into separately identifiable portions such as blocks or sectors, tracks, cylinders, etc. In general, applications do not directly address the disk; rather, disk access and organization are tasks reserved to the operating system, which follows some predefined file system structure. When the guest OS wants to write data to the (virtual) disk, the identifier used for the intended block, etc., is therefore also converted into an identifier into the address space of the physical disk. Conversion may be done within whatever system-level software layer that handles memory, disk and/or file system management for the VM and other processes.
Viruses
A “virus” is a malicious program or code that surreptitiously enters a computer environment. Viruses often replicate themselves, or cause themselves to be replicated, thereby consuming excessive amounts of computer resources, and causing degradation or disruption of computer operation. A “worm” can be defined as a virus that automatically attaches itself to outgoing email or other network messages. Some viruses are written so that they do not seriously harm the infected system. For example, a virus may be written that merely causes the message “Happy Birthday Ludwig!” to repeat on a monitor screen. Other viruses erase or corrupt disk files, or require that a hard disk be entirely reformatted. A virus may wreak its havoc immediately upon entering a computer environment, or may lie dormant until circumstances cause their code to be executed by the host computer. Regardless as to the potential damage that can be caused by a particular virus, all viruses are generally considered malicious, should be prevented from infecting a system, and should be removed if discovered. For present purposes, the term “virus” will refer to any such malicious code.
The threat of viruses is particularly acute in a networked environment, where a computer on the network is accessible to viruses of varying degrees of sophistication and severity created by legions of hackers. These viruses may surreptitiously enter the computer environment through a variety of mechanisms, for example, as attachments to emails or as downloaded files, from a CD or diskette, or through a service program listening to a well known network port, such as that for the RPC service in Windows. To guard against viruses such as these, there is a need for an anti-virus mechanism that is effective and scales easily in a virtual machine environment. There are generally two types of anti-virus software—system scanners that scan a complete disk drive and memory system for malicious code, and “on-access” scanners that scan a file when it is requested by the operating system. An on-access scanner is generally considered the more secure system since the malicious code is not able to cause damage prior to the next complete scan. With the ongoing progress of hardware processing power and the advance of SMP architectures, the number of virtual machines capable of being run on a single hardware host is increasing. With the concomitant proliferation of computer networks, the threat of viruses or worms remains a serious threat to the stability, reliability, and performance of applications and operating systems running within virtual machines.