Many modern day computer systems have a Non-Uniform Memory Access (NUMA) memory design in which memory access times (latency values) depend on the location of a memory relative to a processor that accesses that memory. Such systems often include one or more nodes on which processors operate to execute one or more of program threads. An operating system scheduler of a NUMA based system assigns (or schedules) each of the program threads to execute on a corresponding one of the processors. A node to which a thread is assigned is the home node for that thread and a thread executing on the processor associated with the home node may access memory both local to and remote from the home node. A memory that is local to the home node (a “local memory”) is associated with the home node whereas a memory that is remote to the home node (a “remote memory”) is associated with a node other than the home node (a “remote node”).
In NUMA-based systems, a processor operating on the home node is able to access the local memory faster than the processor is typically able to access the remote memory. Thus, remote memory accesses result in higher memory access latency values which negatively affect system performance. As a result, a system scheduler may to schedule the threads to execute on a node that minimizes the remote memory accesses to be performed by that thread. For example, threads that only access one memory may be assigned/scheduled to execute on the processor associated with the node on which that memory resides. Some operating system schedulers perform affinity-based scheduling in which a thread executed on a node is thereafter determined to have an affinity to that node and continues to be executed on that home node during future executions of the thread, even though the thread may experience poor performance due to high latency values.
Other operating system schedulers are designed to perform thread dependent co-scheduling in which two threads that operate in a co-dependent manner and that share a same memory are scheduled to operate on a same home node on which the shared memory is located. However, in many cases, the operating system scheduler is unable to determine which of numerous threads are co-dependent and share memory. Thus, existing thread-scheduling methods used by operating system schedulers associated with NUMA-based systems are often inefficient and negatively impact the performance of the operating system.