In a traditional multi-processor computing system, each processor (or Central Processing Unit (CPU)) shares a single memory controller. The memory controller controls all the available Dynamic Random Access Memory (DRAM) (i.e., Dual Inline Memory Modules (DIMMs)) of the computing system. All CPUs have equal access to the memory controller and, thus, to the DRAM.
Communication between the CPUs also goes through the memory controller, which can present a bottleneck if multiple CPUs attempt to access the DRAM simultaneously. The number of DIMMs that can be managed by a single controller is limited, thereby limiting the memory capacity supported by a computing system. In addition, the latency to access memory through the single memory controller is relatively high. This architecture therefore does not scale very well as the number of CPUs in a computing system increases.
Non-Uniform Memory Access (NUMA) describes an architecture in which the CPUs of a computing system are able to access some memory locations faster than other memory locations, and in which the faster memory locations are not the same for each CPU. FIG. 1 illustrates hardware of exemplary NUMA system 110.
System 110 includes nodes 112, 114, 116 and 118. Each node includes four CPU cores, a memory controller and memory (e.g., DRAM). Nodes 112, 114, 116 and 118 are interconnected by interconnects 122, 124, 126, 128 and 130, which are responsible for hosting node-to-node inter-communication traffic (e.g., remote memory access).
A node may include one or more CPUs. Each CPU of a particular node has symmetric or equivalent access to the memory of the particular node (e.g., one or more levels of local hardware caches), and is able to access this memory faster than the CPU can access the memory of another node. Specifically, remote memory latency and interconnect contention contribute to the slower access of remotely-located memory. It is desirable to avoid remote memory accesses and resource contention.