For some virtual machines (VMs), virtual central processing units (vCPUs) and their associated memory are executed on non-uniform memory access (NUMA) systems. NUMA systems have multiple memory proximity domains, referred to as NUMA nodes, each of which is a group of CPU cores and memory. In some examples, a CPU package sits in a “socket” and maps to a plurality of NUMA nodes. The entire configuration is, in that example, referred to as a processor. NUMA nodes result in a variance of memory access latencies, unlike uniform memory access (UMA) systems. Examples of NUMA architectures include OPTERON by AMD Systems, Inc. and NAHALEM by Intel Corp. Access by a processor to memory within the same NUMA node is considered local access, and is usually much faster than access to the memory belonging to the other NUMA nodes, which is considered remote access.
Placing vCPUs on NUMA nodes remote from the associated memory of the vCPU increases memory access latency, and degrades overall application performance. Consequently, the CPU schedulers of some existing systems, operating under a “hard” NUMA policy, assign both the vCPU and the associated memory of the vCPU to a single NUMA node referred to as the NUMA “home” node. This approach ensures reduced local memory access latency, but it frequently suffers from high CPU contention on some NUMA nodes and fails to achieve optimal CPU utilization on other nodes. For example, if CPU contention is high on the home node while the contention is lower on remote nodes, the hard NUMA policy fails to utilize otherwise unused CPUs in remote NUMA nodes.
Under existing NUMA migration policies, the NUMA scheduler assigns new home nodes—where CPU contention is lower—to a process or a group of processes. While this addresses the long-term CPU imbalance, reassigning home nodes cannot happen frequently enough to address short-term CPU imbalances at least because the NUMA scheduler has to consider memory load balancing as well. Therefore, a hard NUMA policy combined with NUMA migration still suffers from suboptimal CPU utilization.
For optimal memory locality, a single home node is associated with multiple vCPUs belonging to a VM. Under a hard NUMA policy, assigning a home node to VMs becomes a bin-packing problem where VMs of various sizes need to be placed in fixed size bins (e.g., NUMA nodes). As the size of VMs gets bigger, some bins may have holes that cannot be filled with existing VMs. For example, on a system with two NUMA nodes each of which has six cores, 12 vCPUs should run without CPU contention. If there are three 4-vCPU VMs, the hard NUMA policy places two 4-vCPU VMs on the same node while the other node has only one 4-vCPU VM. This placement results in one node being over-utilized while the other node is being underutilized.