In today's computers, as is well known to those of ordinary skill in the art, virtual memory decouples the address space of the running processes from the physical memory addresses. Virtual memory enables processes to have a large contiguous address space that is not limited by an underlying physical memory and allows the computer to run more processes than can fit simultaneously in their entirety in the available physical memory, i.e., to allow for an “over-commit” of memory. To do this, virtual memory space is divided into pages of a fixed size, typically 4 KB, with a size of 2 MB or greater being defined in this patent document as a “large” page. Each page of the virtual memory space maps onto a page within the physical memory.
In general, and not meant to be a complete description of current technology, an application accesses a virtual address (VA) that is translated into a physical address (PA) that is then used to access the physical memory. The translation produces what is called a linear address (LA). The LA is translated to the PA using hardware called a Memory Management Unit (MMU). If the system architecture does not support segmentation, then the LA is the same as the VA, and the VA is used by the MMU to translate to the PA.
As is well known, translation of a virtual memory address to a physical memory address is done by traversing page tables, located in RAM, that contain mapping information. To speed up the translation, a translation lookaside buffer (TLB) is typically used. The TLB provides a faster translation of virtual addresses to physical addresses than does accessing page tables in RAM because the TLB can provide the beginning-to-end mapping in a single step and because the TLB can be implemented in a small (and, therefore, fast to access) data structure closer to, or in, the CPU. The TLB is limited in size, however, and often a virtual memory page cannot be found in the TLB. Whenever this happens, a “TLB miss” occurs, and the mapping has to be performed by a traversal of the page tables, commonly known as a “page walk,” a much slower process when compared to look-ups in the TLB.
The following is meant as a general explanation for background purposes only and may apply to a 64 bit architecture and a 4 KB page size as well as a 32 bit architecture with different page sizes. A more detailed discussion of translation processes can be found in “Intel 64 and IA-32 Architecture Application Note: TLBs, Paging-Structure Caches, and Their Invalidation” available from Intel Corp. of Santa Clara, Calif., the entirety of which is incorporated by reference for all purposes.
Referring to FIG. 1, with respect to the common x86 architecture, a MMU 100 consists of the following parts: a control register (CR3) 102, a translation lookaside buffer (TLB) 104, and translation circuitry, i.e., TLB fill hardware 106. The paging structures are: (a) 4 KB in size; (b) reside in main memory, generally separate from the MMU; and (c) are designated L4, L3, L2 and L1 page tables 108, 110, 112, 114, respectively. In some implementations, each of the page tables contains 512 8-byte entries comprising information required to perform the translation as will be described below in more detail.
As shown in FIG. 2, the hardware register CR3 102 contains the base address of the root page tables of the currently executing process. In four-level paging (long mode) each level points to the next lower level to, ultimately, determine the location of the backing memory page.
As is well known, an MMU 100 is typically equipped with one or more TLBs 104, where the TLB 104 is a cache of recent LPN (Linear Page Number) to PPN (Physical Page Number) translations. To translate an LA, the MMU 100 computes the LPN, and then looks in the TLB 104 for a translation of the LPN. If the translation is present in the TLB 104, referred to as a “TLB hit,” the PPN is immediately available from the TLB 104. On the other hand, if the translation is not present in the TLB 104, referred to as a “TLB miss,” a page table walk is done, and the translation is stored in the TLB, possibly evicting another entry from the TLB 104.
As shown in FIG. 3, a page table entry (PTE), e.g., PTE 201-n includes multiple fields, or groups of bits, that represent: a physical page number (PPN) 302, a page accessed bit (A) 304, a user permission bit (U) 306, an execute permission bit (X) 308, a write permission (W) bit 310, a read permission (R) bit 312, a page dirty bit (D) 314, a page present bit (P) 316, and a stop bit 318. When a linear address is used to access memory, the processor sets the A-bit 304 to one (1) in all page table level entries used to translate the linear address. It should be understood that FIG. 3 illustrates one possible configuration of bits in a page table entry 201-n and that the number and arrangement of the elements in a page table entry 201-n can be varied from what is shown.
The PPN 302 indicates the next page in the page table hierarchy. If a particular PTE 201-n is at the lowest level of the page table hierarchy, then the PPN 302 points to a data page. If a particular PTE 201-n is not at the lowest level of the page table hierarchy, then the PPN 302 points to a lower-level page table.
The stop bit 318 is set to one (1) to indicate that the corresponding PTE 201-n is at the lowest level of the page table hierarchy. As the size of the data pages may vary within a physical memory, the stop bit 318 may be set to one in PTEs 201-n at various levels in the page table hierarchy. In this fashion, the page walk may be stopped so that one or more levels in the page table hierarchy are not traversed when mapping a large data page that is not in the TLB 104. At level L1 in the page tree hierarchy, i.e., the lowest level that the page table hierarchy supports, the stop bit 318 is ignored.
As shown in FIG. 4, the virtual address 402 is subdivided into five fields: a level four index 404, a level three index 406, a level two index 408, a level one index 410, and an offset 412. The virtual address 402 may include additional bits or fields that are not used during the mapping of virtual memory addresses to physical memory addresses. Each of the index fields 404-410 and the offset field 412 may include any number of bits as may be appropriate for a given computer system. Typically, the subdivision of the virtual address 402 reflects the number of levels supported by the page walker or TLB fill hardware 106, the size of the smallest available physical pages, the size of the virtual memory address space, and the size of the physical memory address space.
In walking the page tables, the page table root CR3 102 is used to determine that the L4 page table is the page table 108. The L4 index 404 is used to index into the page table 108, thereby obtaining an L4 PTE. The PPN 302 stored in this L4 PTE is used to determine the L3 page table 110. The L3 index 406 is used to index into the page table 110, thereby obtaining an L3 PTE. The PPN 302 stored in this L3 PTE is used to determine the L2 page table 112. The L2 index 408 is used to index into the L2 page table 112 thereby obtaining an L2 PTE. The PPN 302 stored in this L2 PTE is used to determine the level one page table 114. The L1 index 410 is used to index into the L1 page table 114, thereby obtaining an L1 PTE. The PPN 302 stored in this level one PTE is used to access the data page 204-n. Subsequently, the offset 412 is used to index into the data page 204-n, thereby accessing the data corresponding to the virtual address 410. In addition, the pair consisting of the virtual page number corresponding to the index fields 404-410 and the physical page number corresponding to the data page 204-n is entered into the TLB 104.
If the stop bit 318 is set in a PTE 201-n that is accessed at a higher level in the page table hierarchy, then the PPN 302 in the corresponding PTE 201-n is used to access a large data page. A system would then index into the large data page using a composition of the remaining index bits of the virtual address 402 and the offset bits 412, thereby accessing the data corresponding to the virtual address 402. In addition, the large page mapping is entered into the TLB 104.
As is well known, virtualizing a MMU so that multiple virtual machines can run on a single hardware system typically entails another level of translation. The first translation is provided by a guest operating system (guest OS) running in a virtual machine. The guest OS translates a guest LPN (GLPN) into a corresponding guest PPN (GPPN) in the conventional manner. The second translation is provided by virtualization software, for example, a virtual machine monitor (VMM). In particular, the VMM maintains a GPPN to machine page number (MPN) mapping in its internal translation table (T) where the host PPN is used to address physical memory, i.e., the MPN. of the hardware system.
One of two methods is typically used for virtualizing an MMU, either a shadowing of guest paging structures (shadowing method), or a hardware assist method. As shown in FIG. 5, the shadowing method for virtualizing an MMU, virtualization software, for example, a virtual machine monitor VMM, maintains shadow page tables 502, with one shadow page table for each guest page table. While the guest page tables 504, maintained by the guest operating system, contain guest LPN to guest PPN mappings, the shadow page tables contain guest LPN to host PPN mappings. To insert a translation for a guest LPN into a shadow page table, the VMM walks the guest page table to determine the guest PPN. Then, it translates the guest PPN to a host PPN using its translation table T.
The architectural extensions introduced by AMD, with its Nested Page Tables (NPT), and Intel, with its Extended Page Tables (EPT), are leveraged in the hardware assist method of virtualizing an MMU. A general overview of hardware assist virtualization of an MMU can be found in the article “Accelerating Two-Dimensional Page Walks for Virtualized Systems,” by Bhargava, et al., ASPLOS'08, Mar. 1-5, 2008, Seattle, Wash., the entire contents of which is hereby incorporated by reference for all purposes.
Memory pages of a larger size, typically 2 MB in an x86 system, are called large pages or super pages. Large pages are supported by many general purpose processors and allow each entry in the TLB to map a large physical memory region into a virtual address space. This increases the TLB reach, i.e., the amount of memory that can be accessed without causing a TLB fault, thereby decreasing the TLB misses, which translates into performance increases for many applications.
One issue presents itself with the use of large pages—the problem of determining which pages to map large, as large pages are a scarce resource. For a system with a VMM using shadow page tables or software MMU, the cost of figuring out which pages to map large should be relatively small in order to gain the maximum benefit. Currently, in a shadow MMU, i.e., a software MMU, large pages are assigned based on the order in which the pages fault. The order of the faulting pages, however, does not necessarily indicate that a page is a good candidate to be backed by a large page.
Using large pages reduces the number of TLB misses and generally improves performance of virtual memory systems. The use of large pages, however, also generally reduces the ability of an operating system to efficiently utilize the physical memory. As large pages pose this inherent tradeoff between fast memory access, and the accompanying increase in performance, and efficient utilization of physical memory, large pages are not typically used universally. Therefore, it is important to optimize their use and deploy them in a manner that will deliver the biggest performance improvement. The optimization of the use of large memory pages is beneficial in both the shadowing of guest paging structures (shadowing method) and the hardware assist method.