1. Field of the Invention
The present invention generally relates to the analysis of computer system performance. More specifically, the present invention relates to a performance analysis tool used to measure the performance of a multi-nodal computer system.
2. Description of the Related Art
Computer systems are widely used to manipulate and store data. Typically, data is stored in a computer system memory and manipulated by application programs executing on a central processing unit (CPU). Many operating systems are capable of multi-tasking, i.e., they are capable of simultaneously executing many different tasks or processes. For example, many operating systems support the use of “threads.” Generally, a thread provides a unit of execution represented by a sequence of instructions and associated data variables. Threads may be executed in parallel with one another, either through time slicing or multiprocessing.
As computer applications have grown in complexity, one approach to increasing system performance has been to design computer systems with multiple CPUs. In one approach, a computer system may be configured with multiple nodes, each node containing one or more CPUs and a local memory. Computer systems such as this may include many nodes and use a sophisticated bus and caching mechanism to transfer data among the different nodes. Typically, each node may access the local memory of any other node; however, doing so may take significantly longer than the time required to access memory for a local node.
Configuring each node with its own processing and memory resources is generally referred to as a NUMA (non-uniform memory access) architecture. A distinguishing feature of a NUMA system is that the time required to access memory locations is not uniform, i.e., access times to different locations can be different depending on the node making the request and the location of the memory being accessed. In particular, memory access by a CPU to memory on the same node as the CPU takes less time than a memory access by the CPU to memory on a different node. Access to memory on the same node is faster because access to memory on a remote node must pass through more hardware components e.g., buses, bus drivers, memory controllers, etc., between nodes to reach the requesting CPU.
For a computer system configured with a NUMA architecture, it is clearly advantageous to minimize the number of references made from a CPU to remote memory. Similarly, when a thread makes a dynamic request for memory, e.g., through program language calls to malloc( ) or new( ), or when data is read from disk, application performance is improved when memory is allocated from the local memory of the CPU executing the thread.
The amount of separation between nodes is generally referred to as “memory affinity” or more simply “affinity.” A node has the greatest affinity with itself, because its CPU(s) can access the local memory region associated with the node faster than they can access memory on other nodes. The affinity between a local node and a remote node decreases as the degree of hardware separation between the local and remote node increases.
A number of mechanisms have been developed for maximizing the utilization of nodal affinity. For example, U.S. patent application Ser. No. 10/793,347, filed Mar. 4, 2004, titled “Mechanism for Assigning Home Nodes to Newly Created Threads” discloses a technique for initially assigning a home node to each thread (i.e., a node to preferentially execute the thread), and U.S. patent application Ser. No. 10/793,470, filed Mar. 4, 2004, titled “Mechanism for Dynamic Workload Rebalancing in a Multi-Nodal Computer System” discloses methods for ensuring that as the workload being performed by the various threads and processes executing on the system changes, that the workload across the nodes remains balanced to reflect the changes in workload.
However, monitoring and analyzing the performance characteristics of a multi-nodal system as work ebbs and flows over time remains very difficult as system administrators lack access to data characterizing system performance. Without a direct mechanism to monitor system performance, a system administrator may be left to guess at the underlying cause of certain aspects of system behavior and to determine or measure the impact of changes to the system in an ad-hoc or unrefined manner. Because of the complexity of most NUMA systems, this approach fails to provide an adequate analysis of the performance characteristics of the system, or of the impact of changes to the computing resources or configuration of such a system. Accordingly, there remains a need for a performance analysis tool used to measure the performance of a multi-nodal computer system.