One aspect of a machine's architecture is the way that the machine's processor(s) are connected to memory. The motherboard of a machine typically contains one or more sockets for processors, and a chipset that, among other things, contains a northbridge that connects the processor(s) to the memory(ies). In what is referred to as a uniform memory architecture (UMA), each processor socket is equidistant from the memory. In a UMA, processor sockets are typically connected to a single northbridge, which connects all of the sockets to one or more memory modules. In a UMA, the latency time for a memory access does not depend on which processor makes the access request. In a non-uniform memory architecture (NUMA), each socket has locally-attached memory. Any processor on a NUMA motherboard can access any processor's locally-attached memory. However, the latency time is lower when a processor accesses the locally-attached memory on its own socket than when the processor accesses memory attached to other sockets.
On a NUMA machine, platform firmware normally implements an interleaving memory policy, which is designed to distribute data evenly across the different memories. For a given memory access requested by a processor, the latency time is lower or higher depending on whether the accessed data resides in the processor's locally-attached memory or in a different memory. Since the threads that access data could be scheduled on any processor, and since the interleaving policy could distribute the data to any memory, whether a given memory access request will have a low or high latency time is largely a matter of random chance. Over a large number of access requests, the average latency time is somewhere between the low latency time for accessing a processor's locally-attached memory and the high latency time for accessing some other processor's attached memory.
Leaving the access latency to random chance makes sense when nothing is known about the data or the programs that will be accessing the data. However, where something is known about the data, there are opportunities to leverage the architecture of a NUMA machine to reduce the average latency time. If a processor accesses only (or mainly) its local memory, the average latency time for requests coming from that processor will tend to be lower than the average number produced by random chance. However many applications, such as search, have not been structured to leverage this aspect of NUMA machines.