In computer science, a virtual machine (VM) is a portion of software that, when executed on appropriate hardware, creates an environment allowing the virtualization of an actual physical computer system. Each VM may function as a self-contained platform, running its own operating system (OS) and software applications (processes). Typically, a virtual machine monitor (VMM) manages allocation and virtualization of computer resources and performs context switching, as may be necessary, to cycle between various VMs.
A host machine (e.g., computer or server) is typically enabled to simultaneously run multiple VMs, where each VM may be used by a remote client. The host machine allocates a certain amount of the host's resources to each of the VMs. Each VM is then able to use the allocated resources to execute applications, including operating systems known as guest operating systems. The VMM virtualizes the underlying hardware of the host machine or emulates hardware devices, making the use of the VM transparent to the guest operating system or the remote client that uses the VM.
In some virtualization systems, the host is a centralized server that is partitioned into multiple VMs to provide virtual desktops to users within an enterprise. A problem with centralized hosting of VMs is the use of shared memory amongst the VMs. Typically, each VM is allocated some minimum storage space out of the shared pool of memory. As such, conserving memory becomes an important consideration in virtualization systems.
One solution to conservation of memory is utilization of a memory duplication mechanism. Memory duplication mechanisms allow for memory aggregation in virtualization systems. Specifically, identical memory blocks across VMs are detected and aggregated, allowing for a much higher density of VMs on a given host when running similar VMs. A memory duplication mechanism will compare a new memory page with memory pages already stored on the host and determine if this new memory page is identical to any of the stored memory pages. If so, the memory duplication mechanism will use the single shared version of the memory page instead of storing multiple copies of the same memory page on a host machine.
Part of the memory duplication mechanism is the utilization of a standard data structure, such as a table or a tree structure, to aid in the determination of identical memory pages. If a table structure is used, it will typically provide a hash of the contents of a memory page and the location of that memory page. The hash function will be for the entire contents of the memory page. If a tree structure is used, it will keep the entire page of memory and use those contents for comparison purposes.
In most cases, a new page of memory introduced at a host machine will not find an identical match via the memory duplication mechanism. However, the comparison function used by the memory duplication mechanism can be resource and time consuming as it will perform a full memory page comparisons in order to locate an identical match for the new memory page. Such a full page comparison will utilize space in the CPU cache of the host machine, as well as processing resources of the CPU itself.