1. Technical Field
The present invention relates to data processing systems and, in particular, to a distributed shared-memory data processing system for determining a utilization of each of a plurality of processing nodes in the system by one of plurality of threads. Still more particularly, the present invention relates to a method and system in a distributed shared-memory data processing system for determining a utilization of each of a plurality of processing nodes in the system by one of plurality of threads by determining a quantity of times the thread accesses each shared-memory in the system.
2. Description of the Related Art
One type of data processing system is a uniprocessor system which has only one central processing unit (CPU) which executes an operating system. This type of system is typically utilized in older computer systems.
Another type of data processing system is a multiprocessor system which has more than one CPU. A particular type of multiprocessor system is a symmetric multiprocessor system (SMP). An SMP system includes a plurality of processors, each having equal access to shared-memory and input/output (I/O) devices shared by the processors. In an SMP system, a single operating system is executed simultaneously by the plurality of processors. The operating system can divide a software application into separate processes that may be executed simultaneously on all of the processors in the system. In this manner, because different processes of the application can be simultaneously executed, the application can be executed in an SMP system faster than it could be executed in a uniprocessor system.
A multiprocessor system must have a method and system for keeping track of the different processes being executed by the different processors. The multiprocessor system utilizes threads to represent the separately dispatchable units of these processes. Threads are utilized by the operating system to keep track of the location and status of each unit of work executing on the plurality of processors.
Multiple SMP systems can be clustered together to form a more powerful data processing system. A clustered SMP system includes multiple nodes which are coupled together via an interconnection network. Each node includes one or more processors and a shared-memory which may be accessed equally by the processors within the node.
One method and system for maintaining a cluster of multiple SMP systems is called a distributed shared-memory system. A distributed shared-memory system is also called a non-uniform memory access (NUMA) system. A NUMA system includes multiple nodes as described above. Each processor in a node in the NUMA system may access the shared-memory in any of the other nodes in the system. Therefore, the memory access may be non-uniform across the nodes.
In a symmetric multiprocessor system, a single operating system is simultaneously executed by a plurality of interconnected processors. The operating system selects threads to dispatch to various processors within the SMP data processing system. A part of the operating system executing on a first processor may select a particular thread to process. The first processor may decide that the selected thread should be executed by any of the other processors in the data processing system. However, typically, the first processor will decide that the selected thread will be executed by the first processor. In the event a processor other than the first processor is selected to execute the thread, the first processor notifies the other processor that the other processor has been selected to execute the thread. The other processor then selects this thread. The other processor dispatches and executes the thread. In this manner, a processor in the system may select any of the processors in the system to execute a thread. The processor selected to execute a thread then dispatches and executes that thread.
To optimize performance of a multiprocessor system, load balancing determinations may be made. These load balance determinations are typically made utilizing a measured activity of each processor in the system. Determinations then can be made regarding processors which might remain idle for extended periods of time compared to the other processors in the system.
However, in NUMA systems, balancing the load on the processors by moving threads to different nodes must be done with great care. A thread's LOAM)/STORE references, the memory references, might cross node boundaries, i.e. might be remote memory references. A remote memory reference is a reference from one processor in a first node to a shared-memory location in a second node. A local memory reference is a reference from one processor in a first node to a shared-memory location in the first node. Remote memory references result: in poor performance.
Processor utilization data alone might call for redistributing threads by balancing the load on each processor as evenly as possible. However, balancing the load utilizing only processor utilization data could increase processor utilization while actually reducing throughput of the system if the wrong threads are redistributed.
Therefore a need exists for a method and system in a data processing system for determining utilization of each of a plurality of nodes in the data processing system by each thread which is executed.