Computer systems are widely used to store and manipulate data. Data is stored in computer system memory and manipulated by computer system programs executing on the computer system's processor. As is well known, a processor is often thought of as the “brains” of the computer system because it is the component within the computer system that executes the computer system's programs, allowing the computer system to do real work. Memory is used to hold computer programs while they are being executed, and to hold data while it is being accessed by the processor executing the computer programs.
To be competitive, the designers of computer systems are continually striving to make computer systems more powerful, while maintaining or reducing computer system size. A common approach is increasing a computer system's overall processing power by increasing the number of processors used. For manufacturing efficiency, processors and memory are often packaged together to form what are called nodes, and computer systems are comprised of one or more such nodes. Within these multi-nodal computer systems, any processor can access memory on any node, but a processor can generally access memory on its own node (a local access) more efficiently than it can access memory on any other node (a remote access).
Computer programs contain a series of instructions that are carried out by the computer system's one or more processors. By carrying out these instructions, processors are said to execute the computer programs. An operating system (the programs that are primarily responsible for operating the computer system for the benefit of other programs) controls the execution of these programs through the use of a job (sometimes called a task or a process). Most processors can only execute one instruction stream at a time, but because they operate so fast, they appear to run many jobs and serve many users simultaneously. The computer operating system gives each job a “turn” at running, and then requires the job to wait while another job gets a turn. In situations where a job needs to wait for something to happen before proceeding (e.g., accessing secondary storage), or where multiple processors are available, a job can create a thread (sometimes called a sub-process or sub-task) to continue or expedite processing asynchronously. A job which has not created any threads can itself be regarded as having a single thread. Thus, jobs can be said to be made up of one or more threads.
From a nodal perspective, the operating system can assign threads to execute in any number of ways. For example, the threads of one job may be selected for execution on a given node while the threads of another job may be selected for execution on a different node. Similarly, threads from the same job may execute on different nodes, and threads that are selected to execute once on a given node may be selected to execute on one or more other nodes before terminating. While this flexibility is beneficial in some respects, it is problematic from a data access perspective. As described above, nodes are comprised of processors and memory, and a processor can access memory on its own node more efficiently than on another node. Thus, in order to execute efficiently, the operating system must assure that each thread accesses its data in memory on the same node on which it is executing.
One way in which operating systems have solved this problem is by associating each thread with a node for which it has a preference both to execute and to access data. Then, when it is time to execute a given thread, the operating system selects a processor on its preferred node whenever possible. Similarly, when data needs to be brought into memory on behalf of the thread, memory on its preferred node is selected whenever possible. This approach is generally helpful in minimizing remote memory accesses, provided that the work done by the executing threads is balanced across the computer system's nodes.
Computer systems with one or more nodes can also be partitioned into two or more logically separate systems. A logical partition may be assigned processors and memory without regard to the node(s) to which they belong. Furthermore, processors and/or memory may be dynamically added to or removed from the partition and/or the computer system due to configuration changes or capacity upgrades or downgrades. The efficiency issues pertaining to local versus remote memory accesses within the computer system also apply within each logical partition. Throughout this description, the term system is used to refer either to an entire non-partitioned computer system, or to a logical partition of a computer system.
One approach to nodal balancing used by operating systems is to include mechanisms that assign work (threads) to preferred nodes in the same proportions as the processors and/or memory resources that are available to the system, on each node.
While these mechanisms are useful in balancing the workload and generally minimizing remote accesses, the prior art operating systems themselves are not optimized to minimize remote accesses associated with the services they provide. For example, consider an operating system which includes an integrated file system that provides various file services to client threads. Associated with that file system would be various resources needed to perform its functions. Among those resources may be a set of kernel threads (privately known to and managed by the operating system) which wait on a queue for messages instructing them to asynchronously process particular portions of a file on behalf of a requesting client. Another resource may be a pool of messages which can be sent to these threads. Unless the message pool and the queue happen to reside in memory on the client thread's preferred node, and unless the kernel thread which happens to service the requests prefers the same node, the interactions between the client thread and the kernel thread servicing it are very likely to involve remote memory accesses. Many other examples could serve to illustrate the same concept.
Without a means to enable operating system services to distribute their resources on a nodal basis such that the particular resources associated with a client thread's preferred node are used to service that thread's requests, use of operating system services will be inefficient because of remote memory accesses. Furthermore, unless the distribution of resources is balanced nodally in the same proportions as the workload on the system, the resources will not be used uniformly, and various other inefficiencies will result depending on the particular resources.