Virtualization technologies are becoming prevalent in the market place. At least some of these technologies provide a virtual hardware abstraction to guest operating systems, and allow them to run in virtual machines in a functionally isolated environment on a host computer without being modified. Virtualization allows one or more virtual (guest) machines to run on a single physical (host) computer, providing functional and performance isolation for processor, memory, storage, etc.
As is well known in the field of computer science, a virtual machine is an abstraction—a “virtualization”—of a physical computer system. FIG. 1 shows one possible arrangement of a computer system (computer system 700) that implements virtualization. As shown in FIG. 1, virtual machine or “guest” 200 is installed on a “host platform,” or simply “host,” which includes system hardware, that is, hardware platform 100, and one or more layers or co-resident components comprising system-level software, such as an operating system or similar kernel, or a virtual machine monitor or hypervisor (see below), or some combination of these. The system hardware typically includes one or more processors 110, memory 130, some form of mass storage 140, and various other devices 170.
Each virtual machine 200 will typically have both virtual system hardware 201 and guest system software 202. The virtual system hardware typically includes at least one virtual CPU, virtual memory 230, at least one virtual disk 240, and one or more virtual devices 270. Note that a disk—virtual or physical—is also a “device,” but is usually considered separately because of the important role of the disk. All of the virtual hardware components of the virtual machine may be implemented in software using known techniques to emulate the corresponding physical components. The guest system software includes guest operating system (OS) 220 and drivers 224 as needed for the various virtual devices 270.
Note that a single virtual machine may be configured with more than one virtualized processor. To permit computer systems to scale to larger numbers of concurrent threads, systems with multiple CPUs have been developed. These symmetric multi-processor (SMP) systems are available as extensions of the PC platform, or as other hardware architectures. Essentially, an SMP system is a hardware platform that connects multiple processors to a shared main memory and shared I/O devices. Virtual machines may also be configured as SMP virtual machines. FIG. 1, for example, illustrates multiple virtual processors 210-0, 210-1, . . . , 210-m (VCPU0, VCPU1, . . . , VCPUm) within virtual machine 200.
Yet another configuration is found in a so-called “multi-core” architecture, in which more than one physical CPU is fabricated on a single chip, with its own set of functional units (such as a floating-point unit and an arithmetic/logic unit ALU), and can execute threads independently; multi-core processors typically share only very limited resources, such as some cache. Still another configuration that provides for simultaneous execution of multiple threads is referred to as “simultaneous multi-threading,” in which more than one logical CPU (hardware thread) operates simultaneously on a single chip, but in which the logical CPUs flexibly share some resources such as caches, buffers, functional units, etc. One or more embodiments of this invention may be used regardless of the type—physical and/or logical—or number of processors included in a virtual machine.
In many cases applications 260 running on virtual machine 200 will function as they would if run on a “real” computer, even though the applications are running at least partially indirectly, that is via guest O/S 220 and virtual processor(s). Executable files will be accessed by the guest O/S from virtual disk 240 or virtual memory 230, which will be portions of the actual physical disk 140 or memory 130 allocated to that virtual machine. Once an application is installed within the virtual machine, the guest O/S retrieves files from the virtual disk just as if the files had been pre-stored as the result of a conventional installation of the application. The design and operation of virtual machines are well known in the field of computer science.
Some interface is generally required between the guest software within a virtual machine and the various hardware components and devices in the underlying hardware platform. This interface—which may be referred to generally as “virtualization software”—may include one or more software components and/or layers, possibly including one or more of the software components known in the field of virtual machine technology as “virtual machine monitors” (VMMs), “hypervisors,” or virtualization “kernels.” Unless otherwise indicated, one or more embodiments of the invention described herein may be used in virtualized computer systems having any type or configuration of virtualization software.
FIG. 1 shows virtual machine monitors that appear as separate entities from other components of the virtualization software. Furthermore, some software components used to implement one or more embodiments of the invention are shown and described as being within a “virtualization layer” located logically between all virtual machines and the underlying hardware platform and/or system-level host software. This virtualization layer can be considered part of the overall virtualization software, although it would be possible to implement at least part of this layer in specialized hardware. The illustrated embodiments are given only for the sake of simplicity and clarity and by way of illustration. Again, unless otherwise indicated or apparent from the description, it is to be assumed that one or more embodiments of the invention can be implemented anywhere within the overall structure of the virtualization software, and even in systems that provide specific hardware support for virtualization.
The various virtualized hardware components in the virtual machine, such as virtual CPU(s) 210-0, 210-1, . . . , 210-m, virtual memory 230, virtual disk 240, and virtual device(s) 270, are shown as being part of virtual machine 200 for the sake of conceptual simplicity. In actuality, these “components” are usually implemented as software emulations 330 included in the VMM.
Different systems may implement virtualization to different degrees—“virtualization” generally relates to a spectrum of definitions rather than to a bright line, and often reflects a design choice with respect to a trade-off between speed and efficiency on the one hand and isolation and universality on the other hand. For example, “full virtualization” is sometimes used to denote a system in which no software components of any form are included in the guest other than those that would be found in a non-virtualized computer; thus, the guest O/S could be an off-the-shelf, commercially available OS with no components included specifically to support use in a virtualized environment.
In contrast, another term, which has yet to achieve a universally accepted definition, is that of “para-virtualization.” As the term implies, a “para-virtualized” system is not “fully” virtualized, but rather the guest is configured in some way to provide certain features that facilitate virtualization. Unless otherwise indicated or apparent, embodiments of this invention are not restricted to use in systems with any particular “degree” of virtualization and are not to be limited to any particular notion of full or partial (“para-”) virtualization.
In addition to the sometimes fuzzy distinction between full and partial (para-) virtualization, two arrangements of intermediate system-level software layer(s) are in general use—a “hosted” configuration and a non-hosted configuration (which is shown in FIG. 1). In a hosted virtualized computer system, an existing, general-purpose operating system forms a “host” OS that is used to perform certain input/output (I/O) operations, alongside and sometimes at the request of the VMM.
As illustrated in FIG. 1, in many cases, it may be beneficial to deploy VMMs on top of a software layer—kernel 600—constructed specifically to provide support for the virtual machines. This configuration is frequently referred to as being “non-hosted.”
Note that kernel 600 is not the same as the kernel that will be within guest O/S 220—as is well known, every operating system has its own kernel. Note also that kernel 600 is part of the “host” platform of the virtual machine/VMM as defined above even though the configuration shown in FIG. 1 is commonly termed “non-hosted;” moreover, the kernel may be both part of the host and part of the virtualization software or “hypervisor.” The difference in terminology is one of perspective and definitions that are still evolving in the art of virtualization.
FIG. 2 illustrates virtual memory management and address mapping functions performed by a VMM 300 and various other components of the virtualized computer system. As illustrated in FIG. 2, the guest O/S 220 generates a guest O/S page table 292. The guest O/S page table 292 contains mappings from GVPNs (Guest Virtual Page Numbers) to GPPNs (Guest Physical Page Numbers). Suppose that a guest application 260 attempts to access a memory location having a first GVPN, and that the guest O/S 220 has specified in the guest O/S page table 292 that the first GVPN is backed by what it believes to be a physical memory page having a first GPPN. The mapping from the first GVPN to the first GPPN is used by the virtual system hardware 201. The memory management module 350 translates the first GPPN into a corresponding MPN (Machine Page Number), say a first MPN, using a so-called BusMem/PhysMem table including mappings from guest physical addresses to bus addresses and then to machine addresses. The memory management module 350 creates a shadow page table 392, and inserts a translation into the shadow page table 392 mapping the first GVPN to the first MPN. In other words, the memory management module 350 creates shadow page tables 392 containing the mapping from the GVPN to the MPN. This mapping from the first GVPN to the first MPN is used by the system hardware 100 to access the actual hardware memory that is backing up the GVPN, and the mapping is also loaded into the TLB (Translation Look-Aside Buffer) 194 to cache the GVPN to MPN mapping for future memory access.
Note that the terms “guest virtual page number (GVPN)” and “guest virtual page” are used synonymously herein with the terms “virtual page number” and “virtual page,” respectively, and with the terms “linear page number” and “linear page,” respectively. Also note that the term “guest physical page number” and “guest physical page” are used synonymously herein with the terms “virtual physical page number” and “virtual physical page,” respectively, because they are not real physical page numbers but what the virtual machine 200 believes to be the physical page numbers. Finally, note that the base address of a page is computed by multiplying the page number of the page by the size of the page.
FIG. 3 illustrates the structure of the guest O/S page table 292 and the shadow page table 392 in a virtualized computer system in more detail. The guest O/S page tables 292 include a plurality of tables (G-PT) 292-1, 292-2 each of which includes entries 301-1, 301-2 with page numbers of other guest page tables or a data page. A data page (DATA-PG) 292-3 includes data 301-3 indicating a guest physical address corresponding to a guest virtual address. vCR3 302 is a virtual page directory base pointer that points to the root guest page table 292-1.
In order to find the guest physical address corresponding to a guest virtual address 308 including a plurality of address fields (ADDR) 308-1, 308-2 and an offset (OFST) field 308-3, a page walk on the guest O/S page table 292 is performed by walking through the guest page tables 292-1, 292-2. Specifically, the root guest page table 292-1 is accessed using the address pointed to by vCR3 302. The first address field 308-1 is an index into entry 301-1 of the root guest page table 292-1. The entry 301-1 includes a physical page number of the next guest page table 292-2, and the next address field 308-2 is an index into entry 301-2 of the guest page table 292-2. The entry 301-2 includes a physical page number of the data page 292-3. The physical address pointing to the data 301-3 corresponding to the virtual address 308 is the base address of the data page 292-3 plus the offset field 308-3. In general, a page walk on the guest O/S page tables 292 presents a significant computational burden on the virtualized computer system.
The structure of the shadow page table 392 mimics that of the guest O/S page table 292. The shadow page table 392 also includes a plurality of tables (S-PT) 392-1, 392-2 each of which includes entries 311-1, 311-2 with page numbers of other tables (S-PT) or a data page 392-3. A data page 392-3 includes data 311-3 indicating a machine address corresponding to a guest virtual address. mCR3 352 is a machine page directory base pointer that points to the root table (S-PT) 392-1.
In order to find the machine address corresponding to a guest virtual address 318 including a plurality of address fields (ADDR) 318-1, 318-2 and the offset (OFST) field 318-3, the CPU 110 performs a page walk on the shadow page tables 392 by walking through the shadow page tables 392-1, 392-2. Specifically, the root shadow page table 392-1 is accessed using the address pointed to by mCR3 352. The first address field 318-1 is an index into entry 311-1 of the root shadow page table 392-1. The entry 311-1 includes a machine page number of the next shadow page table 392-2, and the next address field 318-2 is an index into entry 311-2 of the shadow page table 392-2. The entry 311-2 includes a machine page number of the data page 392-3. The machine address pointing to the data 311-3 corresponding to the virtual address 318 is the base address of the data page 392-3 plus the offset field 318-3.
FIG. 4 is a flowchart illustrating a conventional process for virtual memory access in a virtualized computer system. Referring to FIG. 4, when the guest O/S 220 (or other software within the virtual machine 200) attempts a memory access 402 using a guest virtual address, the system hardware 100 first searches the translation look-aside buffer 194 for the mapping of the guest virtual address to the corresponding machine address and determines whether there is a hardware TLB miss 404. If the corresponding machine address is found, there is no hardware TLB miss 404, and the memory is accessed 421 using the machine address (MA) obtained from the TLB 194. If the corresponding machine address is not found, there is a hardware TLB miss 404, and the system hardware 100 then searches the shadow page table 392 for the mapping of the guest virtual address to the corresponding machine address and determines whether there is a shadow page table (S-PT) miss 406. If the corresponding machine address is found, there is no shadow page table miss 406, and the memory is accessed 421 using the machine address obtained from the shadow page table 392. If the corresponding machine address is not found, there is a shadow page table miss 406, and the system hardware 100 delivers a hardware page fault 408 to the VMM 300, indicating that the corresponding machine page cannot be found in the hardware TLB 194 or the shadow page table 392.
In the conventional process of virtual memory access in a virtualized computer system, the VMM 300 performs a page walk 410 on the guest O/S page table 292 and determines whether the guest virtual address that caused the hardware page fault 408 has a corresponding mapping to a guest physical address in the guest O/S page table 292. If there is a corresponding guest physical address in the guest O/S page table 292, there is no guest page table (G-PT) miss 412. This type of hardware page fault 408 is referred to herein as a “hidden page fault,” because the guest O/S page table 292 does have the mapping to a corresponding guest physical address for the guest virtual address but the corresponding guest virtual address to machine address mapping has not been added to the shadow page table 392 yet. In a hidden page fault, the VMM 300 uses the found guest physical address to determine the corresponding machine address using its BusMem/PhysMem tables and inserts 418 the guest virtual address to machine address mapping in the shadow page table 392. As a result, during the next memory access 402 resulting from a subsequent attempt at executing the instruction accessing the memory, there will still be a hardware TLB miss 404 but there will not be a shadow page table miss 406, and the memory can be accessed 421 using the corresponding machine address. In addition, the guest virtual address to machine address mapping can also be cached in the hardware TLB 194 for future use.
However, if there is no corresponding guest physical address in the guest O/S page table 292, this means there is a guest page table (G-PT) miss 412. This type of hardware page fault 408 is referred to herein as a “true page fault,” because even the guest O/S page tables 292 do not contain the mapping to a corresponding guest physical address for the guest virtual address. In a true page fault, the VMM 300 delivers a page fault 414 to the guest O/S 220, and the guest O/S 220 creates an appropriate mapping from the guest virtual address to a guest physical address and updates 416 the guest O/S page table 292 using the created mapping. During the next memory access 402 resulting from a subsequent attempt at executing the instruction accessing the memory, there will still be a hardware TLB miss 404, a shadow page table miss 406, and a hardware page fault 408. However, the corresponding guest physical address will be found during the guest page table walk 410, and thus there will be no guest page table miss 412. The VMM 300 will now be able to insert 418 the guest virtual address to machine address mapping in the shadow page table 392. Therefore, during a subsequent memory access 402 resulting from an attempted re-execution of the instruction accessing the memory, the memory can be accessed 421 using the corresponding machine address found in the shadow page table 392 (see step 406), as explained above.
Note that, in the case of a true page fault, the VMM 300 unnecessarily performs the guest page table walk 410 only to find that the guest virtual address to guest physical address mapping is not present in the guest O/S page table 292 (i.e., guest page table miss 412). Such unnecessary guest page table walks 410 can present a significant computational burden on the virtualized computer system.
U.S. patent application Ser. No. 11/499,125 titled “Bypassing Guest Page Table Walk for Shadow Page Table Entries Not Present in Guest Page Table,” (“Table Walk Bypass Application”) filed on Aug. 4, 2006 has the same assignee as the present application. The Table Walk Bypass Application discloses a method and system for memory access in a virtualized computer system that eliminates the above-described unnecessary guest page table walks. More specifically, the Table Walk Bypass Application discusses a method and system that do not perform an address translation look-up or a page walk on the guest page tables 292 if the shadow page table entry corresponding to the guest virtual address for accessing the virtual memory indicates that a valid, corresponding mapping from the guest virtual address to a guest physical address is absent in the guest page tables 292. Markers or indicators are stored in the shadow page table entries to indicate that a guest virtual address to guest physical address mapping corresponding to the guest virtual address of the shadow page table entry is not present in the guest page table 292.
As discussed in the Table Walk Bypass Application, when a hardware page fault is issued indicating that the translation look-aside buffers and the shadow page tables 392 do not include a valid machine address mapping corresponding to the virtual address used for accessing the virtual memory 230, it is determined whether an indicator of a shadow page table entry corresponding to the guest virtual address is in a first state or a second state. If the indicator is in the first state, an address translation look-up or a page walk is performed on the guest page tables 292 to determine a guest physical address corresponding to the virtual address. If the indicator is in the second state, a page fault is issued, indicating that the guest page tables 292 do not include the guest physical address corresponding to the virtual address, without performing the address translation look-up or the page walk on the guest page tables 292. The indicator may be a predetermined portion, such as a reserved portion, of the shadow page table entry.
As the Table Walk Bypass Application also discusses, the shadow page tables 392 are managed to maintain the indicator. When a guest page table entry that cannot be used to translate a guest virtual address to a guest physical address is detected, the indicator of the corresponding shadow page table entry for the virtual address is set to a first state. Detecting such guest page table entry may be performed by scanning a subset of entries of the guest page tables 292 to determine whether the entries have valid mappings in the guest page table 292. Detecting the guest page table entry may also be performed by detecting a change from a state where the guest page table entry has a corresponding valid mapping in the guest page tables 292 to another state where the guest page table entry does not have the corresponding valid mapping in the guest page tables 292. By maintaining the indicators of the shadow page table entries in this manner, it is possible to use the indicator to determine whether a guest page walk or address translation look-up can be skipped when a hardware page fault occurs.
In some virtualization environments, an enhanced hardware layer performs some of the interfacing functions between the system hardware 100 and the guest O/S 220. For example, the commercially available Intel® Virtualization Technology (Intel® VT) comprises a set of processor enhancements that enable the VMM 300 to offload certain virtualization tasks to the system hardware 100, including the filtering of error codes resulting from error conditions such as page faults.