Modern computers employ virtual memory to decouple processes, e.g., applications running on top of an operating system, from the physical memory addresses backing the address space of the processes. Using virtual memory enables processes to have a large contiguous address space, and allows the computer to run more processes than can fit simultaneously in their entirety in the available physical memory (i.e., to “over-commit” memory). To do this, virtual memory space is divided into pages of a fixed size (for example, x86 architectures use page sizes of 4 KB, 2 MB, or 1 GB), and each page of the virtual memory space may or may not be mapped onto a page within the physical memory of the same page size. Much of the description in this patent will be in terms of x86 architectures. However, a person of skill in the art will understand how to apply the teachings of the invention to other processor architectures.
Translation of a virtual memory address to a physical memory address is done by traversing page tables in memory that contain mapping information. To speed up translation, a translation look-aside buffer (TLB) is typically used. The TLB provides faster translation of virtual addresses to physical addresses than does accessing page tables in memory because the TLB can provide the beginning-to-end mapping in a single step, and because the TLB can be implemented in a small (and, therefore, fast to access) data structure closer to or in the CPU itself. However, the TLB is limited in size and it is possible that a virtual memory page cannot be found in the TLB. Whenever this happens, a “TLB miss” occurs, and the mapping must be performed by a traversal of the page tables, commonly known as a “page walk,” a much slower process than look-ups in the TLB.
In virtualized computer systems, where multiple virtual machines, each having an operating system and applications running therein, can be configured to run on a single hardware platform, memory management for the virtual machines is carried out by the emulated memory management units (MMUs). One emulated MMU is provided for each virtual machine and the emulated MMU manages the mappings of guest virtual addresses directly to physical memory addresses, also referred to as machine memory addresses, using shadow page tables. Shadow page tables have the same structure as conventional page tables and, as with conventional page tables, shadow page tables need not be traversed if the guest virtual address that needs to be mapped has an entry in the TLB.
Memory management for the virtual machines may also be carried out by MMUs configured in CPUs that support nested page walks. In such systems, a first set of page tables, referred to herein as guest page tables (gPTs), map the virtual address space of each application running in the virtual machines, referred to herein as guest virtual address space, to a physical address space that has been emulated for a virtual machine, referred to herein as guest physical address space. Additionally, a second set of page tables, referred to herein as nested page tables (NPTs) (also known as extended page tables), map the guest physical address space to the address space of machine memory, referred to herein as machine memory address space. Both the first and second sets of page tables are hierarchically arranged and a pointer to the top-level, root table for each set of page tables is stored in a distinct register. In x86 architectures that support nested page walks, the register that stores a pointer to the root table of the gPTs is known as the gCR3 register and the register that stores a pointer to the root table of the NPTs is known as the nCR3 register. It should be recognized that non-x86 architectures employing guest page tables and nested page tables, or the like, may have different structures and accessed in a different manner.
FIG. 1 is a schematic diagram that illustrates nested page walks in a virtualized computer system. In the example of FIG. 1, a guest virtual address 100 is being mapped by MMU 101 to a machine memory address of data 150 stored in machine memory 102 using gPTs 110 and NPTs 120, which are also stored in machine memory 102. Contents of gPTs 110 at all levels include pointers, expressed as guest physical addresses, to guest page tables or guest memory pages, and also permission bits, level bits, and other control bits, and in some implementations, accessed and dirty bits. Contents of NPTs 120 at all levels include pointers, expressed as machine memory addresses, to nested page tables or machine memory pages and also permission bits, level bits, and other control bits, and in some implementations, accessed and dirty bits.
The mapping begins with the guest page walker module of MMU 101 retrieving a pointer to the root table of gPTs 110 from the gCR3 register, which is an address in the guest physical address space. Bits [47:39] of guest virtual address 100 and 3 trailing bits of zeros define the index into the root table and are copied into the 12 least significant bits of this guest physical address. The resulting guest physical address, known as the gL4 address, is translated into a machine memory address using the nested page walker module of MMU 101 and NPTs 120, and the translated address is used to retrieve an address of the next lower-level (L3) table, which is also an address in the guest physical address space. Bits [38:30] of guest virtual address 100 and 3 trailing bits of zeros define the index into this L3 table and are copied into the 12 least significant bits of this guest physical address. The resulting guest physical address, known as the gL3 address, is translated into a machine memory address using the nested page walker module of MMU 101 and NPTs 120, and the translated address is used to retrieve an address of the next lower-level (L2) table, which is also an address in the guest physical address space. Bits [29:21] of guest virtual address 100 and 3 trailing bits of zeros define the index into this L2 table and are copied into the 12 least significant bits of this guest physical address. The resulting guest physical address, known as the gL2 address, is translated into a machine memory address using the nested page walker module of MMU 101 and NPTs 120, and the translated address is used to retrieve an address of the next lower-level (L1) table, which is also an address in the guest physical address space. Bits [20:12] of guest virtual address 100 and 3 trailing bits of zeros define the index into this L1 table and are copied into the 12 least significant bits of this guest physical address. The resulting guest physical address, known as the gL1 address, is translated into a machine memory address using the nested page walker module of MMU 101 and NPTs 120, and the translated address is used to retrieve an address of a data page, which is also an address in the guest physical address space. Bits [11:0] of guest virtual address 100 define the index into this data page and are copied into the 12 least significant bits of this guest physical address. The resulting guest physical address, known as the gPA address, is translated into a machine memory address using the nested page walker module of MMU 101 and NPTs 120, and the translated address is used to retrieve the desired content, i.e., data 150.
Copy-on-write (COW) is a commonly used optimization technique that operates at the granularity of pages. As mentioned above, x86 architectures employ page sizes of 4 KB, 2 MB, or 1 GB, and therefore COW may be implemented in x86 architectures using 4 KB page sizes, 2 MB pages sizes, or 1 GB page sizes. However, 4 KB page size may be finer granularity than what is needed in common workloads. A larger page size would be desirable because it would lower the pressure on the TLB. A 2 MB page size, however, is too coarse and cannot be easily shared to realize the benefits from COW optimization techniques. In addition, once a page is allocated as COW, a write to that page, even a single byte write, would force the entire page to be copied. From a computational efficiency point of view, smaller page sizes for COW optimization would be preferred because the computational overhead associated with such an event increases with larger page sizes. It may be possible to build x86 page tables with page sizes between 4 KB and 2 MB, but this might not be desirable because such a change could affect memory system performance adversely in other ways or be an implementation burden.