This invention pertains in general to computer programming and more specifically to a code profiler for collecting performance metrics about a process executing on a symmetric multiprocessor computer system.
Modern computer systems often have multiple central processing units (xe2x80x9cCPUsxe2x80x9d) that can execute different, or the same, parts of a process simultaneously. Such computers are called symmetric multiprocessor (xe2x80x9cSMPxe2x80x9d) systems. A single process executing on such a computer can have multiple threads of control simultaneously executing on different CPUs.
When designing for or porting software to SMP systems, it is advantageous to optimize the code such that it can take full advantage of the system. For example, a programmer wants to optimize the code such that process computations are balanced among the threads. Similarly, a programmer seeks to maximize the number of CPUs that actively work during particular code regions.
To achieve these ends, a programmer uses a code xe2x80x9cprofilerxe2x80x9d to analyze the behavior of a process and remove performance bottlenecks. Such profilers typically work by determining the number of CPUs and CPU time used by a process, along with other performance information, for particular code regions. The programmer then uses the profiler""s results to revise the process"" structure and operation.
For example, a programmer can use a profiler to compare the CPU time with the real-world, or wall-clock, time used to execute a process. Ideally, the concurrency ratio, the ratio of CPU time to wall-clock time, is equal to the number of CPUs available to the process. If the ratio is less than the number of CPUs, then system overhead or performance bottlenecks are slowing the process and its threads.
Some prior art profilers, however, do not provide programmers with insight into the behavior of the process on a thread-by-thread basis. For example, an extremely naive approach of determining CPU usage followed by some profilers is to use a CPU""s on board timer to sample the time before and after a code region. This approach determines the amount of time the process spent on that code region. This approach, however, fails to account for the amount of work performed by threads symmetrically executing on different CPUs. That is, the profiler will report that amount of wall-clock time spent on the task, but not the CPU time used by threads executing on other processors. Therefore, the information returned by the naive approach does not enable a programmer to determine which parts of a process are truly occupying the majority of the computer system""s time.
Some modern profilers attempt to measure process performance on a thread-by-thread basis. However, such profilers possess different problems. Some profilers require large amounts of data space to hold performance information for each concurrent thread. Others require extreme synchronization to ensure that they produce accurate results. Still other profilers simply do not scale well to systems with many processors. These types of profilers are discussed in more detail in connection with the detailed description of the preferred embodiment.
Therefore, there is a need in the art for a profiler that provides a programmer with a complete analysis of CPU time and other performance metrics of a multithreaded process executing on a SMP computer system. More specifically, there is a need for a profiler that accurately determines performance metrics at the process and thread levels during a single run of the profiled process.
The above and other needs are met by a profiler that accurately measures performance metrics for all threads executing a process on a SMP computer system. The profiler uses dynamic instrumentation to cause threads to sample performance metrics before and after certain code regions. In addition, the profiler uses extensions to a parallel support layer to register a parent thread with its child threads. Each thread stores the measured performance metric, or delta, in a memory cell or cells corresponding to its region and its parent region. When the process is complete, the profiler scans through the memory storage areas and sums the deltas for each particular level of code. Then, the results may be analyzed at the thread or process level. In this manner, the profiler can be adapted to work with any process executing on the computer system.
A technical advantage of the present invention is a way to profile code on symmetric multiprocessor computer systems that accounts for all performance metrics within a code region on a thread-by-thread level.
Another technical advantage of the present invention is a way to profile code that yields concurrency ratios from 0 to xe2x80x98nxe2x80x99 threads.
Yet another technical advantage of the present invention is that it provides a summation of performance metrics for the entire process after a single run of the process.
Yet another technical advantage of the present invention is a way to profile code that minimizes synchronization and data exchange amongst threads.
Yet another technical advantage of the present invention is a way to profile code that works with read-only timers/counters that are maintained or accessed on a thread basis and does not require synchronization amongst threads.
Yet another technical advantage of the present invention is a way to profile code that scales well with an increasing number of threads and processors.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and the specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.