Historically, memory on multi-processor computer systems was equally accessible by all central processing units (CPUs). This is known as uniform memory access. In uniform memory access systems, access times between CPUs and memory are the same no which CPU performs the operations. In a non-uniform memory access (NUMA) system, system memory is divided across NUMA nodes, which correspond to sockets or to a particular set of CPUs that have identical access latency to the local subset of system memory. In NUMA systems, regions of memory connected indirectly in a NUMA system (e.g., a processor accessing memory outside of its allocated NUMA node) may take longer to access than directly-connected regions. As such, parts of memory are faster if accessed by specific processor units.
An application executing in a NUMA system generally performs best when the threads of its processes are accessing memory on the same NUMA node as the threads are scheduled. Operating systems (OSes) optimize performance of such applications in NUMA systems by implementing automatic NUMA balancing. Automatic NUMA balancing moves tasks (which can be threads or processes) closed to the memory they are accessing. It can also move application data to memory closer to the tasks that reference it. This is done automatically by the OS kernel when automatic NUMA balancing is enabled on the system.
When a computing system is implemented as a virtualized computing system, automatic NUMA balancing can also be applied. A virtualized computing system can include one or more host machines and run one or more hypervisors on the host machines. Each hypervisor can support one or more virtual machines, with each of the virtual machines running one or more applications under a guest operating system.