A Non-Uniform Memory Access (NUMA) hardware system/host contains multiple NUMA nodes interconnected by a high-speed link such as Intel QuickPath Interconnect (QPI). Each NUMA node comprises a group of CPUs/cores that have the same memory access latency to local memory and typically longer latencies when accessing memory local to other NUMA nodes via the interconnect. NUMA hardware systems have been deployed in increasing numbers in recent years since it is much easier to scale the number of CPUs in each of such systems than conventional hardware systems.
A virtual machine known as “wide VM” may comprise a large number of virtual CPUs running on a NUMA system. If the number of virtual CPUs exceeds the number of CPUs in each NUMA node of the system, the virtual CPUs of the wide VM can be grouped into a plurality of NUMA clients for the wide VM, wherein the virtual CPUs in each NUMA client can be scheduled to run on the same physical NUMA node of the NUMA system. A guest operating system (OS) of the wide VM sees topology of the NUMA clients in the same way that a native OS would see the NUMA hardware topology of a physical NUMA system. Under such configuration, the virtual CPUs in each NUMA client have similar latency of memory access, and the guest OS can optimize memory allocation based on the NUMA client topology the same way the native OS would optimize memory allocation based on the NUMA hardware information.
An application running on the wide VM can be NUMA aware, meaning that it runs on one of the NUMA clients of the VM. (In the case of a non-wide VM, the application must run on the only NUMA client of the VM.) The NUMA client of the application is scheduled to run on one of the NUMA nodes of the NUMA system. Since the application may conduct extensive I/O operations such as network I/O transactions, it is desirable to have the NUMA client of the VM, hypervisor threads, IO processing threads of a virtual I/O device (virtual interrupts of a virtual Network Interface Card or virtual NIC as a non-limiting example), and the I/O device used by the VM aligned on the same NUMA node in order to achieve the best I/O performance for the VM.