Modern computers employ virtual memory to decouple processes, e.g., applications running on top of an operating system, from the physical memory addresses backing the address space of the processes. Using virtual memory enables processes to have a large contiguous address space, and allows the computer to run more processes than can fit simultaneously in their entirety in the available physical memory (i.e., to “over-commit” memory). To do this, virtual memory space is divided into pages of a fixed size (for example, x86 architectures use page sizes of 4 KB, 2 MB, or 1 GB), and each page of the virtual memory space either maps onto a page within the physical memory of the same page size or it maps to nothing. Much of the description in this patent will be in terms of x86 architectures. However, a person of skill in the art will understand how to apply the teachings of the invention to other processor architectures.
Translation of a virtual memory address to a physical memory address is done by traversing page tables in memory that contain mapping information. To speed up translation, a translation look-aside buffer (TLB) is typically used. The TLB provides faster translation of virtual addresses to physical addresses than does accessing page tables in memory because the TLB can provide the beginning-to-end mapping in a single step, and because the TLB can be implemented in a small (and, therefore, fast to access) data structure closer to or in the CPU itself. However, the TLB is limited in size and it is possible that a virtual memory page cannot be found in the TLB. Whenever this happens, a “TLB miss” occurs, and the mapping has to be performed by a traversal of the page tables, commonly known as a “page walk,” a much slower process than look-ups in the TLB.
In virtualized computer systems, where multiple virtual machines, each having an operating system and applications running therein, can be configured to run on a single hardware platform, memory management for the virtual machines may be carried out by memory management units (MMUs) configured in CPUs that support nested page walks. In such systems, a first set of page tables, referred to herein as guest page tables (gPTs), map the virtual address space of applications running in the virtual machines, referred to herein as guest virtual address space, to the physical address space that has been emulated for the virtual machines, referred to herein as guest physical address space. Additionally, a second set of page tables, referred to herein as nested page tables (NPTs) (also known as extended page tables), map the guest physical address space to the address space of machine memory, referred to herein as machine memory address space. Both the first and second sets of page tables are hierarchically arranged and a pointer to the top-level, root table for each set of page tables is stored in a distinct register. In x86 architectures that support nested page walks, the register that stores a pointer to the root table of the gPTs is known as the gCR3 register and the register that stores a pointer to the root table of the NPTs is known as the nCR3 register. It should be recognized that non-x86 architectures employing guest page tables and nested page tables, or the like, may have different structures and accessed in a different manner.
FIG. 1 is a schematic diagram that illustrates nested page walks in a virtualized computer system. In the example of FIG. 1, a guest virtual address 100 is being mapped by MMU 101 to a machine memory address of data 150 stored in machine memory 102 using gPTs 110 and NPTs 120, which are also stored in machine memory 102. Contents of gPTs 110 at all levels include pointers, expressed as guest physical addresses, to guest page tables or guest memory pages, and also permission bits, present bits, and other control bits, and in some implementations, accessed and dirty bits. Contents of NPTs 120 at all levels include pointers, expressed as machine memory addresses, to nested page tables or machine memory pages and also permission bits, present bits, and other control bits, and in some implementations, accessed and dirty bits.
The mapping begins with the guest page walker module of MMU 101 retrieving a pointer to the root table of gPTs 110 from the gCR3 register, which is an address in the guest physical address space. Bits [47:39] of guest virtual address 100 and 3 trailing bits of zeros define the index into the root table and are copied into the 12 least significant bits of this guest physical address. The resulting guest physical address, known as the gL4 address, is translated into a machine memory address using the nested page walker module of MMU 101 and NPTs 120, and the translated address is used to retrieve an address of the next lower-level (L3) table, which is also an address in the guest physical address space. Bits [38:30] of guest virtual address 100 and 3 trailing bits of zeros define the index into this L3 table and are copied into the 12 least significant bits of this guest physical address. The resulting guest physical address, known as the gL3 address, is translated into a machine memory address using the nested page walker module of MMU 101 and NPTs 120, and the translated address is used to retrieve an address of the next lower-level (L2) table, which is also an address in the guest physical address space. Bits [29:21] of guest virtual address 100 and 3 trailing bits of zeros define the index into this L2 table and are copied into the 12 least significant bits of this guest physical address. The resulting guest physical address, known as the gL2 address, is translated into a machine memory address using the nested page walker module of MMU 101 and NPTs 120, and the translated address is used to retrieve an address of the next lower-level (L1) table, which is also an address in the guest physical address space. Bits [20:12] of guest virtual address 100 and 3 trailing bits of zeros define the index into this L1 table and are copied into the 12 least significant bits of this guest physical address. The resulting guest physical address, known as the gL1 address, is translated into a machine memory address using the nested page walker module of MMU 101 and NPTs 120, and the translated address is used to retrieve an address of a data page, which is also an address in the guest physical address space. Bits [12:0] of guest virtual address 100 define the index into this data page and are copied into the 12 least significant bits of this guest physical address. The resulting guest physical address, known as the gPA address, is translated into a machine memory address using the nested page walker module of MMU 101 and NPTs 120, and the translated address is used to retrieve the desired content, i.e., data 150.
Bottom-level (L1) tables of gPTs and NPTs have page table entries (PTEs) containing pointers to guest physical or machine memory pages and auxiliary information including an accessed bit (A bit), a dirty bit (D bit), and various other bits. The A bit, if set to one, indicates that the memory page referenced by the entry has been accessed since the A bit was last cleared. The D bit, if set to one, indicates that the memory page referenced by the entry has been modified since the D bit was last cleared. The dirty bit may be cleared, i.e., set to zero, when the contents of the modified memory page are committed to disk.
A bits and D bits are examined by various processes before taking some action. In a virtualized computer system, D bits of PTEs are continuously examined during a process for performing backups and during a process for migrating the executing state of virtual machines, to identify those memory pages that have been modified and to transmit to the backup target machine or the migration target machine only those memory pages that have been modified. Alternatively, an operation known as a “diff” operation may be performed on the memory pages that have been modified to identify the changed portions of the memory pages, and only the changed portions are transmitted to the target machine.
When page sizes are relatively large, the efficiency of processes such as the backup process and the migration process is compromised because any modification of a memory page regardless of the size of the modification will cause that memory page to be backed up or migrated. For example, if the memory page size is 4 KB and 8 bytes were written to that memory page, the entire 4 KB page will need to be backed up or migrated. It may be possible to build x86 page tables with smaller memory page sizes but this might not be desirable because such a change could affect memory system performance adversely in other ways or be an implementation burden.