One aspect of the present invention relates to efficiently determining identical pieces of memory within a computer memory area.
Virtualization is becoming more and more important in IT architectures and allows the use of central servers for performing different tasks as required by a user. Such a server comprises a computer as known in the art which has a virtual machine manager, also called a hypervisor, running thereon. The virtual machine manager can be running directly on the hardware of the computer, which means without an external operating system, or as an application within a standard operating system like Linux, Windows, or others. Also virtual machine managers running on an intermediate abstraction layer are known in the art.
The hypervisor is a virtualization software that provides an environment for running virtual machines, which are also called guests. These guests are virtual instances of operating systems, which are encapsulated inside the virtual machine manager and can be executed like running directly on a computer hardware.
Accordingly, each virtual machine running within the hypervisor requires the same amount of memory as being running directly on the computer. With multiple guests running within the hypervisor, the required overall amount of memory increases rapidly. Therefore, virtualization requires the use of a huge amount of a computer memory, e.g. a RAM, which is cost-intensive and thereby limits the number of virtual machines which can be run on a single piece of hardware.
To increase the efficiency of virtualization, it is desired to reduce the amount of memory used by the guests. One idea to achieve this goal is to identify memory regions which store identical information and to replace repeated occurrences of the same memory contents with just a link to another appearance of the memory contents. Such an idea was realized with “Kernel Samepage Mapping” (KSM), which is known from the Linux operating system. The memory used by the different virtual machines is analyzed in pieces of memory pages of a certain size, e.g. 4 kB, by first calculating a hash value over each memory page, ordering the hash values of each virtual machine and comparing the hash values of different virtual machines. Occurrences of identical hash values indicate identical memory pages, so called samepages, so that one of these memory pages can be replaced by a reference to the other. Since virtual machines and applications/processes running on a virtual machine have a dynamic behavior and can be started and stopped at any time, this process of identifying samepages has to be performed continuously. Accordingly, the calculation of the hash values and the comparison require a huge computational effort and reduces available capacity of the resources of the computer. Hashing is a method known to a person skilled in the art and is therefore not further explained.
Improvements have been realized by solutions that are managed by an administrator to identify memory ranges with an increased likelihood for encountering samepages. This has the drawback that a permanent attention of the administrator is required due to the dynamic behavior of the guests.
Another approach for improvement is described in the paper “Increasing Memory Density by using KSM” by Andrea Arcangeli et al., presented at the Linux Symposium on Jul. 13 to 17, 2009 in Montreal, Quebec, Canada, and hereby incorporated herein by reference in its entirety. To reduce the required computational effort for identifying samepages, processes have the option to register which areas of the memory occupied by them should be scanned. This is not transparent at all and requires that the application running on a guest and furthermore the guest itself have such a feature implemented, which prevents that legacy applications/guests benefit from efficient Samepage Mapping. Some operating systems, that are most frequently run directly on a computer hardware, e.g. operating systems are designed to be used by home users, will probably not provide such a feature.