One conventional computer includes central processing units (CPU) coupled to CPU sockets. Each CPU includes respective processor cores. The conventional computer operates in accordance with a non-uniform memory architecture in which memory that can be accessed by a respective core can be classified as either local memory or remote memory with respect to that core, depending upon whether the memory is local to or remote from that core's CPU socket. A local memory access may involve relatively less latency than a remote memory access.
Typically, an operating system in the computer assigns, in accordance with a default scheme, processes to processor cores for execution by the cores. In making these assignments, the operating system makes the assignments of processes to cores based upon current core loading (e.g., a new process is assigned to the core having the least loading). In making these assignments, the operating system has no visibility into, and therefore, does not take into account whether respective processes (1) derive from wholly independent jobs, or (2) are related to each other (e.g., are part of collaborative tasks, utilize common datasets, and/or satisfy parent, child, or sibling relationships). This may result in related processes being assigned to cores of different sockets. This may substantially increase inter-socket communication traffic and remote memory accesses. This may substantially reduce processing and memory access efficiency and performance.
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent to those skilled in the art. Accordingly, it is intended that the claimed subject matter be viewed broadly.