Modern computer systems use various memory management techniques to improve memory utilization. For example, a modern computer usually contains hardware and software components for mapping the working address space (also referred to as virtual memory space) of application programs into physical memory space. An application program can be provided with contiguous virtual memory space, which may be physically fragmented and may even overflow on to disk storage. The virtual memory space of an application program is divided into pages, with each page being a block of contiguous virtual memory addresses. A page table can be used to translate the virtual addresses of an application program into physical addresses used by the hardware to process instructions.
Efficient memory management is critical to the performance of a virtual machine system. A virtual machine is a software implementation of a machine (computer) that includes its own operating system (referred to as a guest operating system) and executes application programs. A host computer allocates a certain amount of its resources to each of the virtual machines, and multiplexes its underlying hardware platform among the virtual machines. Each virtual machine is then able to use the allocated resources to execute its guest operating system and applications. The software layer providing the virtualization is commonly referred to as a hypervisor and is also known as a virtual machine monitor (VMM), a kernel-based hypervisor, or part of a host operating system. The hypervisor virtualizes the underlying hardware of the host computer, making the use of the virtual machine transparent to the guest operating system and the user of the computer.
On a computer system that stores a large number of files and executes a large number of application programs, it is likely that some of the files and programs contain duplicated contents. A host computer that supports multiple virtual machines can also have duplicated contents in its virtual machines, guest operating systems, and/or application programs. Storing duplicated contents in memory leads to poor utilization of memory resources and degradation of system performance. For example, thrashing may occur when there is insufficient physical memory to hold a working set of running programs. Thus, there is a need to develop a technique for detecting and managing programs that have duplicated contents.