In a conventional bare-metal computer system that supports memory virtualization, the operating system (OS) running on the machine maintains a set of mappings between virtual memory addresses allocated to processes (e.g., applications) and physical memory addresses where data corresponding to the virtual memory addresses are stored, or configured to be stored, in physical system memory (e.g., RAM). These mappings are held in one or more data structures known as page tables. When a process wishes to read or write a piece of data to/from memory, the process issues a memory read/write instruction that identifies the virtual memory address of the data. This virtual memory address is passed to a memory management unit (MMU) of the system's CPU which translates, in hardware, the virtual memory address into a corresponding physical memory address based on the page table mappings. The CPU then uses the translated physical memory address to carry out the instruction against the system's physical memory.
In a virtualized host system comprising a hypervisor and one or more virtual machines (VMs), memory virtualization is more complex because the hypervisor usually provisions physical system memory among the VMs for various purposes (e.g., memory over-subscription, VM isolation, live migration, etc.). This memory provisioning by the hypervisor adds another level of indirection (and thus, another level of address translation) for memory operations. For example, consider the scenario shown in FIG. 1. In this example, hypervisor 100 has allocated some regions of the physical memory of the system (shown as host physical memory 102) to a VM 104 in the form of a contiguous guest physical memory 106. From the perspective of guest OS 108 running within VM 104, guest physical memory 106 appears to reflect the actual physical memory of the system, when it is in fact a virtual address range provisioned by hypervisor 100. Guest OS 108 has, in turn, allocated some regions of guest physical memory 106 to a process 110 in the form of a contiguous guest virtual memory 112.
With the configuration shown in FIG. 1, when process 110 wishes to access data assigned to, e.g., a guest virtual memory address (GVA) 114, GVA 114 must first be translated into a guest physical memory address (GPA) 116 within guest physical memory 106 that is mapped to GVA 114. GPA 116 must then be translated into a host physical memory address (HPA) 118 within host physical memory 102 that is mapped to GPA 116. Once HPA 118 is determined, the system's CPU can carry out the requested memory instruction against host physical memory 102.
One known approach for implementing the two-level memory address translation described above is referred to as the shadow page tables (SPT) approach and is illustrated in FIG. 2. In this approach, the guest OS of each VM maintains and updates guest OS page tables 200 (also known as emulated page tables) that comprise GVA-to-GPA mappings for that VM. Guest OS page tables 200 are conceptually similar to the conventional page tables maintained by the OS of a bare-metal (i.e., non-virtualized) computer system; however, unlike conventional page tables, guest OS page tables 200 are not used by the system's MMU to perform address translation. Instead, the MMU uses a set of shadow page tables 202 that are maintained by the hypervisor and that comprise GVA-to-HPA mappings derived from (1) the GVA-to-GPA mappings in guest OS page tables 200 and (2) GPA-to-HPA mappings maintained in, e.g., a “pmap” data structure 204. In order to ensure coherency between guest OS page tables 200 and shadow page tables 202, each time a VM issues an instruction to update a GVA-to-GPA mapping in a guest OS page table 200, the hypervisor traps the instruction, determines the GVA-to-GPA mapping that is being updated, translates the GPA into a corresponding HPA (using the GPA-to-HPA mappings in pmap data structure 204), and then writes the GVA-to-HPA mapping to shadow page tables 202.
Another known approach for implementing two-level memory address translation is referred to as the nested page tables (NPT) approach. In this approach, each VM maintains a first set of page tables comprising GVA-to-GPA mappings and the hypervisor maintains a second set of page tables comprising GPA-to-HPA mappings. The system's MMU traverses both sets of page tables upon each memory read or write in order to translate a GVA into a corresponding HPA that can be used to access host physical memory.
The main advantage of the SPT approach shown in FIG. 2 is that, since shadow page tables 202 store direct GVA-to-HPA mappings, the two memory address translations typically needed for each memory read/write instruction (i.e., GVA-to-GPA and GPA-to-HPA) are effectively collapsed into one (i.e., GVA-to-HPA). However, the hypervisor incurs non-negligible overhead each time it traps a VM-initiated modification of guest OS page tables 200 in order to synchronize the modification to shadow page tables 202. This overhead includes direct costs such as the CPU cycles needed to context switch from the VM to the hypervisor and back, as well as indirect costs that arise from, e.g., dirtied CPU caches and the like.
The main advantage of the NPT approach is that the hypervisor does not need to trap changes to the guest OS page tables as in the SPT approach. But, since the MMU must access two separate sets of page tables, address translations (i.e., page walks) will generally be more time consuming as they require more memory accesses. This problem is mitigated to an extent by the MMU's translation lookaside buffer (TLB), which the MMU uses to cache most recently accessed memory address mappings. However, the NPT approach will generally put more pressure on the TLB (i.e., fill it with more entries, causing older entries to be evicted faster), which increases the likelihood of TLB caches misses when compared to the SPT approach.