Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is a computer memory design used in multiprocessor systems, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors.
NUMA attempts to address the problem of processors starved for data due to waiting on memory accesses to complete. NUMA provides for separate memory for each processor (or group of processors) in a multiprocessor system, thereby avoiding the performance hit when several processors attempt to address the same memory. Each grouping of processor and associated connected memory is known as a NUMA node.
Of course, not all data ends up confined to a single task, which means that more than one processor may require the same data. To handle these cases, NUMA systems include additional hardware and/or software to move data between banks. However, the performance of the multiprocessor system with NUMA nodes depends on the exact nature of the tasks running on each NUMA node at any given time. For instance, memory access will be much slower when a processor in one NUMA node has to access memory in another NUMA node. If such cross-node memory accesses occur frequently, then the multiprocessor system will incur a significant negative performance impact.
It is advantageous for a system implementing NUMA to try to minimize inter-node communication as much as possible. As such, a mechanism to optimize processor task placement in NUMA nodes in order to minimize inter-node communication would be beneficial.