In recent years virtualization has become an important enabling technology and has put significant demand for proper utilization of limited resources of the system hosting multiple virtual machine (VM) clients. In order to increase utilization of resources in a VM environment, an overprovisioning technique is often utilized. For example, a hosting platform with a total of 4 GB of RAM may provide resources for two guest VMs with their respective memory spaces being 3 GB and 2 GB. Thus, a total size of configured RAM (5 GB) may exceed the maximum available physical resource (4 GB).
Several techniques have been proposed to address potential issues related to possible resource conflicts resulting from such memory over-provisioning. One of the known techniques implemented in an ESX platform of VMware is known as memory page swapping, which is performed by a hypervisor which is based on page allocation information that is available at a hypervisor level. In computing systems, a hypervisor may be known as a virtual machine manager (VMM). A hypervisor operates as a computer software program, firmware and/or hardware that creates and runs virtual machines. A computer on which a hypervisor is operating one or more virtual machines is defined as a host machine. Each virtual machine is known as a guest machine. The hypervisor presents the guest operating systems with a virtual operating platform and manages the execution of the guest operating systems. Multiple instances of a variety of operating systems may share the virtualized hardware resources managed by the hypervisor. It has been reported that while this page allocation technique provides some level of optimization to the overcommitted memory configuration, it can result in significant performance penalties to the hosted VM as well as the whole system. Therefore, effective low levels of available memory must be monitored, addressed and reallocated to alleviate a situation resulting from an over-provisioning of resources.
According to conventional approaches, virtualization is an abstraction layer that decouples the physical hardware from the operating system to deliver resource utilization and flexibility. Virtualization allows multiple virtual machines, with heterogeneous operating systems (e.g., Windows XP, Linux, Free BSD, etc.) and applications to operate in isolation, side-by-side on the same physical machine. A virtual machine is the representation of a physical machine by software. It has its own set of virtual hardware (e.g., RAM, CPU, NIC, hard disks, etc.) upon which an operating system and applications are loaded. The operating system may experience a consistent, normalized set of hardware regardless of the actual physical hardware components.
There are at least two architectures 100 for VM realizations in FIG. 1. One example is a hosted architecture 110 illustrated in FIG. 1 which operates by having virtualization software layer 114 installed as an application onto a pre-existing host OS 112. In this case, the virtualization layer 114 relies on the host operating system 112 for device support and physical resource management. VMware Server, VMware Workstation, and Microsoft Virtual PC are examples of a hosted architecture.
Another example is a native hypervisor architecture 120 also illustrated in FIG. 1. In this example, there is no pre-existing OS. In this case, a hypervisor or virtual machine manager (VMM), operates directly on the host's hardware, controls the hardware, and manages guest operating systems. It is named “Hypervisor” because it is conceptually one level higher than a supervisory program. The hypervisor presents to the guest operating systems 122 a virtual operating platform 124 and manages the execution of the guest operating systems 122. The virtualization software 124 is installed on a clean system, and it provides kernel and driver support for the raw physical hardware. The VMware ESX server is an example of virtualization utilizing hypervisor type architecture.
Each of the conventional architectures has pros and cons. For example, the hosted architecture 110 relies on the underlying OS for hardware support, and therefore can support more hardware at a lower cost. However there is a significant overhead due to the resources needed by the hosting OS. The native hypervisor architecture 120 requires significantly less resources and therefore is preferred when high performance is a key requirement for a VM system.
Within an operating system, each application operates as though it has access to all of physical memory the operating system offers. Since multiple programs operate at the same time, each process cannot own and/or occupy all of the memory available. Instead processes are using virtual memory. In a virtual memory system, all of the addresses are virtual addresses and are not physical addresses. For example, a program may be accessing data in memory address 629, but the virtual memory system doesn't have data stored in RAM location 629. In fact, it may not even be in the RAM, since it could have been moved to the disk, and the program is accessing the virtual addresses. These virtual addresses are converted into physical addresses by the processor based on information held in a set of tables maintained by the operating system.
The operating system maintains a table of virtual address-to-physical address translations so that the computer hardware can respond properly to address requests. If the address is on disk instead of in RAM, the operating system will swap memory, i.e., temporarily halt the process, unload other memory to disk, load in the requested memory from disk, and restart the process. This way, each process gets its own address space to operate within and can access more memory than is physically installed.
To make the address translation easier, virtual and physical memory are divided into handy sized chunks called pages. These pages are all the same size, they need not be but if they were not, the system would be very hard to administer. LINUX on Alpha A×P systems uses 8 Kbyte pages and on INTEL x86 systems it uses 4 Kbyte pages. Each of these pages is given a unique number: the page frame number (PFN). The pages of virtual memory do not have to be present in physical memory in any particular order.
In the paged model, a virtual address is composed of two parts: an offset and a virtual page frame number. If the page size is 4 Kbytes, bits 11:0 of the virtual address contain the offset and bits 12 and above are the virtual page frame number. Each time the processor encounters a virtual address it must extract the offset and the virtual page frame number. The processor must translate the virtual page frame number into a physical one and then access the location at the correct offset into that physical page and to do this the processor uses page tables.
FIG. 2 illustrates the virtual address spaces 200 of two processes, process X 210 and process Y 250, each with their own page tables 220 and 240, respectively. These page tables process virtual pages into physical pages in memory. This illustrates that process X's virtual page frame number 0 is mapped into memory in physical page frame number 1 and that process Y's virtual page frame number 1 is mapped into physical page frame number 4 of the physical memory 230. Each entry in the theoretical page table contains the following information: (a) valid flag, which indicates if this page table entry is valid, (b) the physical page frame number that this entry is describing, and (c) access control information, which describes how the page may be used.
To translate a virtual address into a physical one, the processor must first work out the virtual addresses page frame number and the offset within that virtual page. By making the page size a power of 2 this can be easily done by masking and shifting. The processor uses the virtual page frame number as an index into the processes page table to retrieve its page table entry. If the page table entry at that offset is valid, the processor takes the physical page frame number from this entry. If the entry is invalid, the process has accessed a non-existent area of its virtual memory. In this case, the processor cannot resolve the address and must pass control to the operating system so other resolutions may be explored.
The processor notifies the operating system that the correct process has attempted to access a virtual address for which there is no valid translation based on the processor type. The processor delivers the information, by a page fault and the operating system is notified of the faulting virtual address and the reason for the page fault.
Swapping out memory pages occurs when memory resources become limited. Traditionally, this is a task performed by a memory resource manager residing inside the kernel of each OS. In case of multiple VMs sharing a single pool of hardware resources, it may be possible to increase the effectiveness of utilization of those resources by adding additional software functionality into a sub-system of the VM, such as a hypervisor, that could monitor data available to all VM clients.
An application begins and uses the interfaces provided by the operating system to explicitly allocate or deallocate the virtual memory during the execution. In a non-virtual environment, the operating system assumes it owns all physical memory in the system. The hardware does not provide interfaces for the operating system to explicitly “allocate” or “free” physical memory. Different operating systems have different implementations to realize this abstraction. One example is that the operating system maintains an “allocated” list and a “free” list, so whether or not a physical page is free depends on which list the page is currently residing.
Because a virtual machine operates an operating system and several applications, the virtual machine memory management properties combine both application and operating system memory management properties. Like an application, when a virtual machine first starts, it has no pre-allocated physical memory. The virtual machine cannot explicitly allocate host physical memory through any standard interfaces. The hypervisor creates the definitions of “allocated” and “free” host memory in its own data structures. The hypervisor intercepts the virtual machine's memory accesses and allocates host physical memory for the virtual machine on its first access to the memory. In order to avoid information leaking among virtual machines, the hypervisor always writes zeroes to the host physical memory before assigning it to a virtual machine.
Virtual machine memory deallocation operates like an operating system, such that the guest operating system frees a piece of physical memory by adding these memory page numbers to the guest free list, but the data of the “freed” memory may not be modified at all. As a result, when a particular portion of guest physical memory is freed, the mapped host physical memory will usually not change its state and only the guest free list will be changed.
In order to increase memory usage utilization, ESX supports memory over-commitment which is when the total amount of guest physical memory of the running virtual machines is larger than the amount of actual host memory. To effectively support memory over-commitment, the hypervisor provides host memory reclamation techniques. These techniques are transparent page sharing, ballooning, and host swapping. Page sharing is a well-known technique where the OS identifies identical memory pages and provides mechanisms for applications to share the identical pages thus making page duplication unnecessary.
Ballooning makes the guest operating system aware of the low memory status of the host. VMware White Paper “Understanding Memory Resource Management in VMware® ESX™ Server” describes the process of the balloon inflating. In ESX, a balloon driver is loaded into the guest operating system as a pseudo-device driver. It has no external interfaces to the guest operating system and communicates with the hypervisor through a private channel. The balloon driver polls the hypervisor to obtain a target balloon size. If the hypervisor needs to reclaim virtual machine memory, it sets a proper target balloon size for the balloon driver, making it “inflate” by allocating guest physical pages within the virtual machine.
For ballooning to work as intended, the guest operating system must install and enable the balloon driver. The guest operating system must have sufficient virtual swap space configured for guest paging to be possible. Ballooning might not reclaim memory quickly enough to satisfy host memory demands. In addition, the upper bound of the target balloon size may be imposed by various guest operating system limitations.
Another known issue is the double paging problem. Assuming the hypervisor swaps out a guest physical page, it is possible that the guest operating system pages out the same physical page, if the guest is also under memory pressure. This causes the page to be swapped in from the hypervisor swap device and immediately to be paged out to the virtual machine's virtual swap device.
Each of the methods described above have their drawbacks. Page sharing and ballooning are known to be slow in addressing the problem, and the existing hypervisor swapping method utilized by VMware often causes performance problems explained in previous paragraphs. Therefore an alternative and more efficient method is needed to mitigate low memory problem resulting from over provisioning effects in the VM environment.