1. Technical Field
The present invention relates to data processing systems and, in particular, to a distributed shared-memory data processing system for determining utilization of shared-memory included within each of a plurality of coupled processing nodes. Still more particularly, the present invention relates to a method and system in a distributed shared-memory data processing system for determining utilization of shared-memory included within each of a plurality of coupled processing nodes by a designated application.
2. Description of the Related Art
One type of data processing system is a uniprocessor system which has only one central processing unit (CPU) which executes an operating system. This type of system is typically utilized in older computer systems.
Another type of data processing system is a multiprocessor system which has more than one CPU. A particular type of multiprocessor system of a symmetric multiprocessor system (SMP). An SMP system includes a plurality of processors each having equal access to memory and input/output (I/O) devices shared by the processors. In an SMP system, a single operating system is executed simultaneously by the plurality of processors. The operating system can divide a software application into separate processes that can execute simultaneously on the processors in the system. In this manner, because different processes of the application can simultaneously be executed, the application can be executed in an SMP system faster than it could be executed in a uniprocessor system.
A multiprocessor system must have a method and system for keeping track of the different processes being executed by the different processors. The multiprocessor system utilizes threads to represent the separately dispatchable units of these processes. Threads are utilized by the operating system to keep track of the location and status of each unit of work executing on the plurality of processors.
Multiple SMP systems can be clustered together to form a more powerful data processing system. A clustered SMP system includes multiple nodes which are coupled together via an interconnection network. Each node includes one or more processors and a shared-memory which can be accessed equally by the processors of the node.
One method and system for maintaining a cluster of multiple SMP systems is called distributed shared-memory system. A distributed shared-memory system is also called a non-uniform memory access (NUMA) system. A NUMA system includes multiple nodes as described above. Each processor in a node in the NUMA system can access the shared-memory in any of the other nodes in the system. Therefore, the memory access may be non-uniform across the nodes.
In a symmetric multiprocessor (SMP) system, a single operating system is simultaneously executed by a plurality of interconnected processors. The operating system selects threads to dispatch to various processors within the SMP data processing system. A part of the operating system executing on a first processor may select a particular thread to process. The first processor may decide that the selected thread should be executed by any of the other processors in the data processing system. However, typically, the first processor will decide that the selected thread will be executed by the first processor. In the event a processor other than the first processor is selected to execute the thread, the first processor notifies the other processor that the other processor has been selected to execute the thread. The other processor then selects this thread. The other processor dispatches and executes the thread. In this manner, a processor in the system may select any of the processors in the system to execute a thread. The processor selected to execute a thread then dispatches and executes that thread.
A user may desire to monitor and tune, or optimize, the performance of an application executing on a NUMA system. In order to tune the application, it would be helpful to be able to obtain runtime load balancing information regarding the accessing of shared-memory by each node within the NUMA system. An application""s locality access ratio is data which is also useful for determining the quality of the performance of the application within the particular system. The locality access ratio is the ratio of memory references made by the application that are to the local node""s memory versus the total references made by that node including both local and remote memory accesses.
A local memory access is a reference from a processor in a first node to a memory location included within the shared-memory included within the first node. A remote memory reference is a reference from a processor in a first node to a memory location included within the shared-memory included within a second node. Numerous remote memory references result in poor performance for the particular application.
Therefore a need exists for a method and system in a data processing system for determining utilization of shared-memory included within each of a plurality of coupled processing nodes.
It is therefore one object of the present invention to provide an improved data processing system.
It is another object of the present invention to provide a method and system in a distributed shared-memory data processing system for determining a utilization of shared-memory included within each of a plurality of coupled processing nodes.
It is yet another object of the present invention to provide a method and system in a distributed shared-memory data processing system for determining a utilization of shared-memory included within each of a plurality of coupled processing nodes by a designated application.
The foregoing objects are achieved as is now described. A method and system in a distributed shared-memory data processing system are disclosed having a single operating system being executed simultaneously by a plurality of processors included within a plurality of coupled processing nodes for determining a utilization of each memory location included within a shared-memory included within each of the plurality of nodes by each of the plurality of nodes. The operating system processes a designated application utilizing the plurality of nodes. During the processing, for each of the plurality of nodes, a determination is made of a quantity of times each memory location included within a shared-memory included within each of the plurality of nodes is accessed by each of the plurality of nodes.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.