Computer systems are widely used to store and manipulate data. Data is stored in computer system memory and manipulated by computer system programs executing on the computer system's processor. As is well known, a processor is often thought of as the “brains” of the computer system because it is the component within the computer system that executes the computer system's programs, allowing the computer system to do real work. Memory is used to hold computer programs while they are being executed, and to hold data while it is being accessed by the processor executing the computer programs.
To be competitive, the designers of computer systems are continually striving to make computer systems more powerful, while maintaining or reducing computer system size. A common approach is increasing a computer system's overall processing power by increasing the number of processors used. For manufacturing efficiency, processors and memory are often packaged together to form what are called nodes, and computer systems are comprised of one or more such nodes. Within these multi-nodal computer systems, any processor can access memory on any node, but a processor can generally access memory on its own node (a local access) more efficiently than it can access memory on any other node (a remote access).
Computer programs contain a series of instructions that are carried out by the computer system's one or more processors. By carrying out these instructions, processors are said to execute the computer programs. An operating system (the programs that are primarily responsible for operating the computer system for the benefit of other programs) controls the execution of these programs through the use of a job (sometimes called a task or a process). Most processors can only execute one instruction stream at a time, but because they operate so fast, they appear to run many jobs and serve many users simultaneously. The computer operating system gives each job a “turn” at running, and then requires the job to wait while another job gets a turn. In situations where a job needs to wait for something to happen before proceeding (e.g., accessing secondary storage), or where multiple processors are available, a job can create a thread (sometimes called a sub-process or sub-task) to continue or expedite processing asynchronously. A job which has not created any threads can itself be regarded as having a single thread. Thus, jobs can be said to be made up of one or more threads.
From a nodal perspective, the operating system can assign threads to execute in any number of ways. For example, the threads of one job may be selected for execution on a given node while the threads of another job may be selected for execution on a different node. Similarly, threads from the same job may execute on different nodes, and threads that are selected to execute on a given node may be selected to also execute on one or more other nodes before terminating. While this flexibility is beneficial in some respects, it is problematic from a data access perspective. As described above, nodes are comprised of processors and memory, and a processor can access memory on its own node more efficiently than on another node. Thus, in order to execute efficiently, the operating system must assure that each thread accesses its data in memory on the same node on which it is executing.
One way in which operating systems have solved this problem is by associating each thread with a node for which it has a preference both to execute and to access data. Then, when it is time to execute a given thread, the operating system selects a processor on its preferred node whenever possible. Similarly, when data needs to be brought into memory on behalf of the thread, memory on its preferred node is selected whenever possible. This approach is generally helpful in minimizing remote memory accesses, provided that the work done by the executing threads is balanced across the computer system's nodes.
Computer systems with one or more nodes can also be partitioned into two or more logically separate systems. A logical partition may be assigned processors and memory without regard to the node(s) to which they belong. Furthermore, processors and/or memory may be dynamically added to or removed from the partition and/or the computer system due to configuration changes or capacity upgrades or downgrades. The efficiency issues pertaining to local versus remote memory accesses within the computer system also apply within each logical partition. Throughout this description, the term system is used to refer either to an entire non-partitioned computer system, or to a logical partition of a computer system.
While the prior art of associating threads with a preferred node is helpful in minimizing remote memory accesses on an individual thread basis, it does not take into account the long term nodal workload balance across the system. In particular, while the prior art selects the home node for newly created threads based on existing conditions at the time, it cannot predict future workload changes, and must rely on being able to create new threads in the future in order to counteract these ongoing workload changes. If new threads are not being created, or are not being created frequently enough, or if the new threads themselves do not do sufficient work, the prior art cannot sufficiently respond to changes in workload to keep the system balanced from a nodal perspective.
Without a mechanism for observing the nodal workload balance of the system on an ongoing basis and for dynamically changing the preferred nodes of existing threads in order to improve nodal balance, the performance benefits of local memory accesses promised by multi-nodal systems will not be realized. The unavailability of local resources will either lead to delays in services as threads have to wait for their preferred resources, or local memory accesses will be replaced by remote memory accesses as remote resources are used instead.