Modern computers employ virtual memory to decouple processes, e.g., applications running on top of an operating system, from the physical memory addresses backing the address space of the processes. Using virtual memory enables processes to have a large contiguous address space, and allows the computer to run more processes than can fit simultaneously in their entirety in the available physical memory (i.e., to “over-commit” memory). To do this, virtual memory space is divided into pages of a fixed size (for example, x86 architectures use page sizes of 4 KB, 2 MB, or 1 GB), and each page of the virtual memory space either maps onto a page within the physical memory of the same page size or it maps to nothing. Much of the description in this patent will be in terms of x86 architectures. However, a person of skill in the art will understand how to apply the teachings of the invention to other processor architectures.
Translation of a virtual memory address to a physical memory address is done by traversing page tables in memory that contain mapping information. To speed up translation, a translation look-aside buffer (TLB) is typically used. The TLB provides faster translation of virtual addresses to physical addresses than does accessing page tables in memory because the TLB can provide the beginning-to-end mapping in a single step, and because the TLB can be implemented in a small (and, therefore, fast to access) data structure closer to or in the central processing unit (CPU) itself. The TLB is limited in size and it is possible that a virtual memory page cannot be found in the TLB. Whenever this happens, a “TLB miss” occurs, and the mapping has to be performed by a traversal of the page tables, commonly known as a “page walk,” a much slower process than look-ups in the TLB.
In virtualized computer systems, where multiple virtual machines, each having an operating system and applications (or processes) running therein, can be configured to run on a single hardware platform, memory management for the virtual machines is carried out by the emulated memory management units (MMUs). One emulated MMU is provided for each virtual machine and the emulated MMU manages the mappings of guest virtual addresses directly to physical memory addresses, also referred to as machine memory addresses, using shadow page tables. Shadow page tables have the same structure as conventional page tables and, as with conventional page tables, shadow page tables need not be traversed if the guest virtual address that needs to be mapped has an entry in the TLB.
Another way to support address translation for a virtualized system is through hardware-assisted virtualization. A CPU can include hardware-assisted virtualization features, such as support for hardware virtualization of MMU. For example, modern x86 processors commercially available from Intel Corporation include support for MMU virtualization using extended page tables (EPTs). Likewise, modern x86 processors from Advanced Micro Devices, Inc. include support for MMU virtualization using Rapid Virtualization Indexing (RVI). Other processor platforms may support similar MMU virtualization. In general, a CPU can implement hardware MMU virtualization using nested page tables (NPTs). In a virtualized computing system, a guest OS in a VM maintains page tables (referred to as guest page tables) for translating virtual addresses to addresses for a virtual memory provided by the hypervisor (referred to as guest-physical addresses). The hypervisor maintains NPTs that translate guest-physical addresses to physical addresses for system memory (referred to as host-physical addresses or machine addresses). Each of the guest OS and the hypervisor exposes the guest paging structures and the NPTs, respectively, to the CPU. MMU translates virtual addresses to host-physical addresses by walking the guest page structures to obtain guest-physical addresses, which are used to walk the NPTs to obtain host-physical addresses.
Both conventional page tables and shadow page tables are hierarchically arranged and a pointer to the top-level, root table is stored in a register. In x86 architectures, this register is known as the CR3 register, and it should be recognized that non-x86 architectures employing page tables may have different structures and accessed in a different manner. A series of intermediate-level tables is traversed to reach bottom-level (“terminal”) page tables that have page table entries (PTEs) containing pointers to memory pages and auxiliary information including an accessed bit (A bit), a dirty bit (D bit), and various other bits. The A bit, if set to one, indicates that the memory page referenced by the entry has been accessed since the A bit was last cleared. The D bit, if set to one, indicates that the memory page referenced by the entry has been modified since the D bit was last cleared. The dirty bit may be cleared, i.e., set to zero, when the contents of the modified memory page are committed to disk.
A bits and D bits are examined by various processes before taking some action. In a virtualized computer system, D bits of PTEs are continuously examined during a process for performing backups and during a process for migrating the executing state of virtual machines, to identify those memory pages that have been modified and to transmit to the backup target machine or the migration target machine only those memory pages that have been modified. Alternatively, an operation known as a “diff” operation may be performed on the memory pages that have been modified to identify the changed portions of the memory pages, and only the changed portions are transmitted to the target machine.
When page sizes are large and metadata granularity is coarse, the efficiency of processes is compromised. As used herein, the term “metadata” is used to refer to data that describes and/or gives information about other data. As used herein, the term “granularity” refers to the specificity of a metadata bit. For example, a dirty bit granularity of 16 KB for a page table means that a dirty bit denotes whether a change has occurred within 16 KB chunks of page table memory. A granularity that is “finer” than 16 KB is, for example, an 8 KB granularity, in which a single dirty bit denotes whether a change has occurred within 8 KB chunks of page table memory. A granularity that is “coarser” that 16 KB is, for example, a 32 KB granularity, in which a single dirty bit denotes whether a change has occurred within 32 KB chunks of page table memory.
System software is critically dependent on memory metadata to efficiently manage memory. Such metadata, for example per-page access and dirty bits, enables system software to estimate how frequently a page is accessed or modified, which in turn informs subsystems such as transparent huge page support (THP), page reclamation, and kernel same-page merging (KSM). For example, page reclamation handles memory pressure by swapping out pages infrequently accessed by one process, enabling the OS to allocate pages to another process which needs them. The success of such policies depends heavily on the granularity of the metadata used to predict access and modification frequencies. Decisions taken based on metadata can profoundly impact the performance of workloads and utilization of the system.
System-level services such as swapping pages to disk and cache coherence may be inefficient with coarse metadata granularity. A single dirty bit per 2 MB or 1 GB page gives system software a very coarse hint about whether the contents of a page have been updated, forcing software to choose between blindly transferring page contents which may not have been updated, or inducing overheads by using smaller pages. Common computer processes such as the backup process and the migration process are compromised by coarse granularity, because any modification of a memory page regardless of the size of the modification will cause that memory page to be backed up or migrated. For example, if the memory page size is 2 MB and 8 bytes were written to that memory page, the entire 2 MB page may need to be backed up or migrated.