As the instructions and data for an application running in a computer system are loaded into system memory before being executed, performance is generally improved if there is more memory available to support any active applications. Indeed, an application requiring real-time processing of complex calculations such as voice-recognition software, interactive graphics, etc., will not run properly at all unless a certain amount of RAM (Random Access Memory) is reserved for its use.
An application may be defined broadly as any body of code that is loaded and that executes substantially as a unit. Applications include word processing programs, spreadsheets and games; Internet browsers and e-mail programs; software drivers; web servers; and software implementations of a whole computer, commonly known as a “virtual machine” (VM).
High-speed system memory (RAM) is a limited resource and, as with most limited resources, there is often competition for it. This has become an even greater problem in modern multi-tasked systems, e.g., virtualized systems, in which several applications may be running, or at least resident in memory, at the same time. More efficient management of RAM can reduce the cost, energy, or physical space required to support a given workload. Alternatively, more efficient management of RAM can allow a system to support a larger number of applications with better performance, given a fixed monetary, energy, or physical space budget.
In almost all computer systems, and as known to one of ordinary skill in the art, an operating system provides an application with a virtual memory that appears, to the application, as contiguous working memory, while in fact it may be physically fragmented and may even overflow on to disk storage. The operating system keeps mappings of virtual page numbers to physical page numbers stored in page tables as page table entries (PTEs). Almost all implementations use page tables to translate the virtual addresses seen by the application into physical or machine addresses used by the hardware to process instructions. This technique makes programming of large applications easier and uses real physical memory (RAM) more efficiently than those without virtual memory. All modern x86 CPUs include a memory management unit (MMU) and a translation lookaside buffer (TLB) to optimize virtual memory performance.
There may be one page table per system or a separate page table may be provided for each application. If there is only one page table, running applications share a single virtual address space, i.e., they use different parts of a single range of virtual addresses. Systems that use multiple page tables provide multiple virtual address spaces—concurrent applications think they are using the same range of virtual addresses, but their separate page tables redirect to different real, i.e., machine, addresses.
In a virtualized system, such as those available from VMware, Inc. of Palo Alto, Calif., in order to run multiple virtual machines on a single system, another level of memory virtualization is required. In other words, one has to virtualize the MMU to support the guest OS. The guest OS continues to control the mapping of virtual addresses to the guest memory physical addresses by creating and maintaining guest page tables, but the guest OS typically does not have any control or knowledge of mappings to the actual machine memory. A Virtual Machine Monitor (VMM), or other virtualization software or logic, is responsible for mapping guest physical memory to the actual machine memory. The VMM creates and maintains shadow page tables containing mappings from virtual addresses to machine addresses. The mappings in the shadow page tables are typically derived from guest OS mappings in the guest page tables and the VMM's mappings from guest physical memory to the actual machine memory. As known, the guest page tables may be “protected” by memory traces and the VMM intercepts any attempt to access the guest page tables by operation of the traces. Any access to the protected guest page tables results in a “tracing fault” that is handled by the VMM. A hardware MMU typically uses the shadow page tables and a TLB to map the virtual memory directly to the machine memory, so that the two levels of translation are not necessary on every access. When the guest OS changes the virtual memory to physical memory mapping in the guest page tables, the VMM updates the shadow page tables to enable a direct lookup by the MMU.
The MMU virtualization creates some overhead for all virtualization approaches, e.g., additional processing of virtual addressing, page table management, paging, etc. The VMM is not precisely aware of when the guest finishes using a page as a page table page. The VMM, however, can benefit from having this information to remove the traces to eliminate its costs as well as to free the shadow page table page corresponding to the guest page table page in the shadow-based MMU virtualization. Thus, any methods or systems that can free up the memory related to the shadow page table pages and/or traces are desirable as this can reduce latency, increase speed and increase efficiency.