In the field of computer virtualization, a hypervisor is said to be “memory overcommitted” if the total configured guest memory size of all virtual machines (VMs) running on the hypervisor exceeds the physical memory available to the hypervisor from the underlying physical machine (i.e., host system). Most hypervisors in use today support memory overcommitment, as this allows users to achieve higher VM-to-host consolidation ratios and thereby reduce costs and/or improve operational efficiency.
When a virtualized host runs out of free memory, the hypervisor will attempt to reclaim physical memory from running VMs (in units referred to as “pages” or “memory pages”) in order to satisfy new memory allocation requests. The hypervisor can implement a number of different techniques for this purpose, such as page sharing, ballooning, and host swapping. Page sharing is an opportunistic technique that attempts to collapse different memory pages with identical content into a single memory page. This can significantly reduce the memory footprint of VMs that operate on identical code/data. Ballooning relies on a guest balloon driver within the guest operating system of each VM. The guest balloon driver pins guest memory that is not actively used, thereby allowing the hypervisor to free memory pages backing the pinned guest memory and re-allocate the freed pages to other VMs.
Host swapping is a memory reclamation technique that involves swapping out memory pages to a file on disk (known as a “host swap file”). Unlike guest swapping, this process is performed entirely at the hypervisor level, and thus is transparent to VMs. When a VM subsequently attempts to access a swapped-out page, the hypervisor swaps the page back into memory from the host swap file. Unfortunately, this swap-in operation incurs a disk access latency that can be as high as several milliseconds, which is orders of magnitude slower than the typical latency for accessing a shared or ballooned page. As a result, host swapping is generally used as a last resort when the host system's free physical memory falls below a critically low level.
To mitigate the high performance penalty of host swapping, some hypervisors implement optimizations with its host swapping process to prune its list of “swap candidate pages” (i.e., memory pages that have been selected for host swapping) immediately prior to swapping the pages out to disk. For example, according to one implementation, a hypervisor can first attempt to perform page sharing with respect to a swap candidate page. If the sharing operation is successful, the hypervisor can free the swap candidate page so that it can be re-allocated, without performing any host swapping. Since the performance penalty associated with accessing a shared page is significantly smaller than the performance penalty associated with swapping in a page from disk, this optimization can provide a substantial performance boost over pure host swapping.
If the sharing operation is not successful, the hypervisor can further attempt to compress the swap candidate page via a technique known as “memory compression.” If the hypervisor determines that the page is compressible via memory compression, the hypervisor can save the compressed page content in a fixed-size block within an in-memory compression cache. The hypervisor can then free the swap candidate page as in the page sharing scenario above, without performing any host swapping. On the other hand, if the page is uncompressible, the hypervisor can swap the page out to disk per its normal host swapping process.
The amount of physical memory that the hypervisor can reclaim with memory compression depends on the difference in size between memory pages and compression cache blocks. For instance, if the memory page size is 4 KB and the compression cache block size is 2 KB, the hypervisor will reclaim 4 KB−2 KB=2 KB of space for each compressed page. Thus, memory compression is not as effective at reclaiming physical memory as host swapping, which can reclaim 4 KB in this scenario for a swapped out page. However, the next time a VM attempts to access a compressed page, the hypervisor only needs to decompress the page and fault it into main memory. The latency for this operation is usually around fifty microseconds, which is almost a hundred times faster than disk swap-in latency. Thus, like page sharing, memory compression can significantly improve performance in low memory scenarios where the only alternative is host swapping.
One limitation with the optimizations described above is that the hypervisor generally does not know whether a given memory page is sharable or compressible until the hypervisor actually attempts to share or compress the page before swapping it out. This prevents the hypervisor from selecting swap candidate pages in a manner that maximizes successful page sharing/compression and minimizes host swapping. For example, according to one common approach, the hypervisor may randomly select swap candidate pages when host swapping is initiated, without any regard to the content of those pages. This may cause the hypervisor to inadvertently select unsharable and/or uncompressible pages as swap candidates, even if there are plenty of sharable or compressible pages to choose from.