Virtualization allows multiplexing of an underlying host machine between different virtual machines. The host computer allocates a certain amount of its resources to each of the virtual machines. Each virtual machine is then able to use the allocated resources to execute applications, including operating systems (referred to as guest operating systems). The software layer providing the virtualization is commonly referred to as a hypervisor and is also known as a virtual machine monitor (VMM), a kernel-based hypervisor, or a host operating system. The hypervisor emulates the underlying hardware of the host computer, making the use of the virtual machine transparent to the guest operating system and the user of the computer.
In virtual machine systems, memory management is one of the most fundamental issues. Typically, a computer system includes a hierarchy of memory that ranges from a small, fast cache of main memory that is placed in front of a larger, but slower, auxiliary memory. The cache is generally implemented using a physical memory, such as RAM, while the auxiliary memory is implemented using a storage device, such as a disk drive or hard disk. Both memories are usually managed in uniformly sized units known as pages. Because of their impact on performance, caching algorithms that manage the contents of the main memory are of tremendous importance to a significant number of computer systems, servers, storage systems, and operating systems.
In addition, many computers and operating systems today implement a virtual memory. Virtual memory is where the computer system emulates that it has more memory than the computer system actually possesses.
In order to provide a virtual memory of this size, the computer system runs the application or process in a memory address space that is virtual, i.e., not tied to the physical memory. The computer system then swaps pages (i.e., units of memory) in and out of a cache in its physical memory in order to emulate the virtual memory. Data structures such as page tables and translation lookaside buffers (TLB) are typically utilized to manage the pages. During operation, an application or process continually request pages using virtual memory addresses. In response, the computer system will translate the virtual memory address into a physical memory address and determine if the page is present in the cache (i.e., the page is resident). When a requested page is not present in the cache, it is called a cache “miss” (or page fault), and the requested page must be retrieved from storage.
The physical memory may be arranged to include multiple memory nodes each with a local processor, a memory controller and local memory. For example, under a non-uniform memory access (NUMA) architecture, the memory access time depends on the memory location relative to a processor (i.e., a processor accesses the local memory in the associated memory node (i.e., a NUMA node) faster than the memory of a remote memory node).
Memory management techniques, executed by VMMs or by operating system kernels (OSK) are utilized in such computer systems, however these techniques invoke significant inefficiencies if an incorrect or non-optimal memory management decision is made and/or require significant monitoring and involvement by a system administrator. According to such techniques, memory decisions relating to page size, page placement (i.e., assignment of the page to a memory node), and page replication (i.e., copying a page to one or more additional memory nodes) are made in a manual, static manner to optimize memory usage and allocation.
According to page size management methodologies, a control monitor (i.e., the VMM or OSK) selects a page size for the pages comprises within a region. However, if the control monitor selects a small page size for a heavily accessed region (i.e., a collection of pages), it will incur heavy TLB pressure, which reduces performance. Likewise, if the control monitor chooses a large page size for an infrequently accessed region (to reduce the number of pages in the region), it loses tracking granularity for that region, and so may later make incorrect swap choices.
Page placement management techniques also suffer from inefficiencies. For example, in a NUMA architecture, a process can be bound to a specific node through a manual and static decision by a system administrator. If the control monitor (VMM/OSK) places a heavily accessed page in a different memory node (e.g., a NUMA node) than where the processing device performing most of the access resides, then it incurs a performance penalty for cross-node access. Similarly, if the VMM/OSK places an infrequently accessed page in the same memory node as the processing device that accesses it, then it increases memory pressure on that memory node, and can cause heavily accessed pages to be swapped out or migrated away.
Furthermore, page replication techniques also present problems, in that if a heavily accessed read-only page (typically a library or executable page) is instantiated once, then it may incur a performance penalty when accessed from a remote memory node. Moreover, if multiple copies of a read-only page are instantiated, and that page is not heavily accessed, then memory has been wasted for little gain.
Although current processing devices offer a performance monitoring unit (PMU) that allows for tracking memory access information and identifying memory nodes having heavy activity, the typical means of using PMUs requires a user (e.g., the system administrator) to run a profiler identifying problem areas and making manual decisions (e.g., pinning memory and/or processes to specific memory nodes or processors). This manual and static approach to memory management consumes considerable amounts of time and resources, and does not adapt when the workload, application, and/or hardware changes.