In the field of computer virtualization, a hypervisor is said to be “memory overcommitted” if the total configured guest memory size of all virtual machines (VMs) running on the hypervisor exceeds the physical memory available to the hypervisor from the underlying physical machine (i.e., host system). Most hypervisors in use today support memory overcommitment, as this allows users to achieve higher VM-to-host consolidation ratios and thereby reduce costs and/or improve operational efficiency.
When a virtualized host runs out of free memory, the hypervisor will attempt to reclaim physical memory from running VMs (in units referred to as “pages” or “memory pages”) in order to satisfy new memory allocation requests. The hypervisor can implement a number of different techniques for this purpose, such as page sharing, ballooning, and host swapping. Page sharing is an opportunistic technique that attempts to collapse different memory pages with identical content into a single memory page. This can significantly reduce the memory footprint of VMs that operate on identical code/data. Ballooning relies on a guest balloon driver within the guest operating system of each VM. The guest balloon driver pins guest memory that is not actively used, thereby allowing the hypervisor to free memory pages backing the pinned guest memory and re-allocate the freed pages to other VMs.
Host swapping is a memory reclamation technique that involves swapping out memory pages to a file on disk (known as a “host swap file”). Unlike guest swapping, this process is performed entirely at the hypervisor level, and thus is transparent to VMs. When a VM subsequently attempts to access a swapped-out page, the hypervisor swaps the page back into memory from the host swap file. Unfortunately, this swap-in operation incurs a disk access latency that can be as high as several milliseconds, which is orders of magnitude slower than the typical latency for accessing a shared or ballooned page. As a result, host swapping is generally used as a last resort when page sharing, ballooning, and/or other techniques are ineffective in bringing the host system's free physical memory above a critically low level.
To mitigate the high performance penalty of host swapping, some hypervisors leverage a compression technique within its host swapping process (known as “memory compression”) that attempts to compress a swap candidate page (i.e., a memory page that has been selected for swapping) before the page is swapped out to disk. If the hypervisor determines that the page is “compressible” (i.e., can be compressed to an output size that satisfies a target compression ratio), the hypervisor saves the compressed page data in a fixed-size block within an in-memory compression cache. The hypervisor then frees the swap candidate page so that it can be re-allocated, without performing any host swapping. On the other hand, if the page is “uncompressible” (i.e., cannot be compressed to an output size that satisfies the target compression ratio), the hypervisor swaps out the page to disk per its normal host swapping process. For the purposes of this disclosure, “compression ratio” is defined as data output (i.e., compressed) size divided by data input (i.e., uncompressed) size, and therefore will be a value between zero and one (with a lower compression ratio indicating more efficient compression).
The amount of physical memory that the hypervisor can reclaim with memory compression depends on the difference in size between memory pages and compression cache blocks. For instance, if the memory page size is 4 KB and the compression cache block size is 2 KB (which means that the target compression ratio is 0.5 or less), the hypervisor will reclaim 4 KB−2 KB=2 KB of space for each compressed page. Thus, memory compression is not as effective at reclaiming physical memory as host swapping. However, the next time a VM attempts to access a compressed page, the hypervisor only needs to decompress the page and fault it into main memory. The latency for this operation is usually around fifty microseconds, which is almost a hundred times faster than disk swap-in latency. Therefore, memory compression can significantly improve VM performance in low memory scenarios where the only alternative is host swapping.
One limitation with memory compression as described above is that the hypervisor does not know whether a given memory page is compressible until it actually attempts to compress the page before swapping it out. This prevents the hypervisor from selecting swap candidate pages in a manner that maximizes successful page compressions and minimizes host swapping. For example, according to a common approach, the hypervisor may select swap candidate pages at random when host swapping is initiated. This may cause the hypervisor to inadvertently select uncompressible pages as swap candidates (resulting in page compression failures and host swapping), even if there are plenty of compressible pages to choose from.
In theory, the hypervisor can overcome this problem by applying the compression algorithm it uses for memory compression as a “checker” to check the compressibility of a memory page before adding the page to its swap candidate list. However, this solution requires that the compression algorithm be relatively fast, since the “checking” operation (which involves compressing the page and comparing the output compression ratio with the target compression ratio) must be performed on a large proportion of memory pages. Some hypervisors currently use the zlib compression algorithm for memory compression, which provides a lower compression ratio than most other compression algorithms (and thus advantageously reduces the likelihood of page compression failures), but is also slower than most other compression algorithms. This means that, in practice, these hypervisors cannot rely on zlib for both memory compression and compressibility checking, because zlib is too slow and/or resource intensive for the latter function.