The present disclosure relates to memory access devices and, in particular, to data shuffling in non-uniform memory access devices.
Non-uniform memory access (NUMA) architectures have begun to emerge as architectures for improving processor performance, such as in multi-core processors. In a NUMA architecture, each socket or processing node has its own local memory, such as dynamic random access memory (DRAM), and each socket or processing node is connected to the other sockets to allow each socket to access the memory of each other socket. Thus, in NUMA architectures, access latency and bandwidth vary depending on whether a socket is accessing its own local memory or remote memory of another socket or processing node.