A virtual machine (VM) is an abstraction, i.e., a “virtualization,” of a physical computer system and provides an environment in which an operating system may execute with apparent control of a dedicated physical machine. Multiple virtual machines may execute on a common hardware machine and each virtual machine may operate with protection and isolation from other virtual machines executing on the same common hardware machine. Each virtual machine typically encapsulates a complete executing state for a corresponding operating system, including both user-level applications and kernel-mode operating system services.
In many computing environments, each hardware machine is typically underutilized when executing a corresponding server application. As a result of hardware underutilization averaged over many hardware machines, computing environments configured to dedicate a hardware machine to each server application are typically characterized as being very inefficient with respect to cost, power consumption, management, and, potentially, reliability.
Virtual machines are advantageously deployed to consolidate multiple software servers in a computing environment onto one or more shared hardware machines for execution. A hypervisor is a software layer that virtualizes hardware resources and presents a virtual hardware interface to one or more virtual machine instances that may reflect an underlying hardware machine architecture or an abstraction of an arbitrary machine architecture. The hypervisor may perform certain management functions with respect to an executing virtual machine.
Each virtual machine executing on a hardware machine includes a memory image of apparent physical memory. Because virtual machines tend to have working sets that are smaller than memory configured for the virtual machine, hardware machine memory may be efficiently overcommitted for many applications. For example, a hardware machine with four gigabytes of total machine memory may host a set of virtual machines that has a total of sixteen gigabytes of apparent configured physical memory. While approximately four gigabytes of machine memory are actually available at any one time, this four gigabytes of machine memory can be used by the virtual machines in the set in a multiplexed manner by demand-paging to a file residing in an attached mass storage system. The mass storage system conventionally comprises one or more magnetic hard disk drives, however, any form of mass storage system may be used. For example, in modern computer systems the mass storage system may comprise a solid-state drive (SSD) or an array of SSDs. Page sharing and ballooning, among various techniques, may be employed to reduce demand paging and enhance overall efficiency.
Page sharing is a memory reclamation technique widely used in virtual execution environments. This technique saves memory by eliminating duplicate pages—once such duplicates are identified for a given page's content, corresponding guest pages are mapped to the same shared machine page copy-on-write (COW) and old backing pages are released to the platform.
Ideally, duplicate contents should be identified and a corresponding page should be shared right at the moment a page's content is created, or is about to be created, in guest's memory. One known implementation of such a principle, for example, is when guest code responsible for creation of zero pages is identified by the hypervisor and its execution is skipped. The hypervisor then backs the guest physical memory page with the shared page of that content. Another known example is when an I/O operation reads page contents from a disk block that was already read to a different page. If this situation can be recognized then this read operation, and all subsequent reads from the same disk block, should be skipped and destination guest physical memory pages should be immediately backed with a shared machine memory page. These and other similar methods, however, cannot detect all sharing opportunities so that periodic searches for duplicates, known as transparent page sharing, typically takes place and covers both code and data pages' domains.
Memory content evolves over time and, therefore, so do sharing opportunities. An exhaustive periodic search for duplicates is not an option due to time constraints and, therefore, the scope of any search for duplicates is typically limited to a subset of memory pages. This subset might be obtained, for example, through random page sampling or sequential page scanning. Other selection criteria driven by page locality information, I/O, or execution activity, may also be applied. Another optimization approach typically used to speedup matching of identical pages applies a hash function to a page's content so that the hash values, and not the content, are compared most of the time.
In one known memory sharing approach, for each memory page to be examined, the following steps are typically taken. First, a hash of the page's contents is computed. Then, the result of hashing is looked up in a table that tracks all currently shared pages. If a page with an identical hash is found, a byte-by-byte comparison of the two pages' contents is performed, to assure that the contents do match, so that sharing can be initiated. If no match among already shared pages was found, the previously examined pages (hints) tracked by the same or a different hash table are tried next. As before, a table lookup is followed by a byte-by-byte comparison if a page with an identical hash was found.
Of these, the most expensive operations are hash computation and content comparison, although hash table lookup does not come for free and can produce noticeable overhead if any of the hash tables is improperly balanced.
Hash tables are typically maintained in a context visible to all VMs on the host. When virtualization software controls all the resources of the host itself, i.e., a “bare-metal model,” the shared context is provided by the kernel. When virtualization software is running on a commodity operating system, i.e., a “hosted model,” such a shared context is implemented in a special kernel module (a vmmon driver in products of VMware, Inc.) loaded in to the host OS and candidates for sharing are supplied there by a user level process. Being in a shared context, therefore, hash tables are typically protected by one or more global locks.
Global lock contentions and context switches, for example, from the user level process to the driver, add additional overhead to page sharing. Further, as the amount of time dedicated to a page sharing service is limited, a maximum rate at which pages may be examined is dictated by the per-page processing overhead. Thus, it is desirable to keep the cost of each operation as low as possible.
What is needed, therefore, is a mechanism for efficiently sharing memory pages in a virtualized system.