1. Technical Field
The present invention relates in general to a system and method for profiling software programs. More particularly, the present invention relates to a system and method for writing profile data to a small pinned buffer that is used to update a larger histogram buffer.
2. Description of the Related Art
Profiling software is the process of analyzing the performance of software. Profiling can be used to detect patterns of use, to verify performance, to optimize code, to identify data corruption and to expose memory leaks or excessive resource demands. Components of a large system can be profiled individually or together. Profiling is accomplished through software tools—software that runs and/or instruments the application under study.
There are at least two traditional methods for profiling software. First, many traditional UNIX-like operating systems, such as IBM's AIX™ operating system, support a profil( ) system call. The profil( ) system call registers a memory array with the kernel along with a scale factor that determines how the program's execution address space maps into the array. For example, using a scale factor of 1:1 would profile every program counter, while a scale factor of 2:1 would group every two program counters (i.e., the first and second instructions would be profiled together, the third and fourth instructions would be profiled together, etc.). While increasing the scaling factor reduces the accuracy of the profiling, it also uses less memory since the histogram buffer for a 2:1 scaled profile is roughly half as large as a 1:1 scaled profile.
When a profiled program is executing, the value of the program counter is examined at some periodic interval (such as every time slice interrupt) and the corresponding slot in the histogram buffer (memory array) is incremented. Because the examination of the program counter and incrementing of the corresponding slot is performed by the kernel at interrupt time, the histogram buffer passed by the program in the profil( ) system call is pinned to real memory. In addition, the clock (or time slice) interrupt routine determines the corresponding slot in the histogram buffer that needs to be incremented based upon the program counter. Determining the correct slot can be somewhat computationally intensive, especially for larger (e.g., 64 bit) addresses. Thus, a challenge of the prior art is that significant system overhead, in terms of system resources and time, are associated with profiling a program.
Furthermore, pinning the entire histogram buffer causes additional challenges in that the histogram buffer may be as large as half the size of the text area of the program that is being profiled. One of these additional challenges is that the kernel is exposed to potential security attacks, such as a denial of service attack, because an ordinary user's program can pin large amounts of memory by directly calling the profil( ) system call and passing in a large histogram buffer or by profiling a program with a very large text area (i.e., execution address space). Another challenge of pinning a large histogram buffer is that there is little or no scalability. Thread-level profiling does not work well for a large number of threads because each thread of the process requires its own large pinned buffer.
A second method for profiling programs in some operating systems that do not support a profil( ) system call, such some versions of the Linux™ operating system, is for a signal call to be installed for a signal, usually “SIGALRM,” and for an interval timer to be allocated for the process to periodically deliver the signal to the process at the same frequency at which the program counter needs to be examined. The signal handler receives the execution context of the thread at the point when it was interrupted by the signal. The signal handler then performs the operation of extracting the program counter from the execution context and incrementing a slot in a memory array (the histogram buffer) based on the value of the program counter. This method is advantageous in that the user buffer does not have to be pinned in memory since it is updated in process (non-interrupt) mode. However, a challenge of this method is that it requires a signal to be delivered to user space every time a sample is taken, and consequently, requires considerably more overhead than kernel-based profiling (described as the first method above). An additional challenge is that this method requires that the process install its own signal handler which would effectively overlay the functionality provided by the profiling-based signal handler. The most serious challenge of this method, however, is that this method is less accurate as compared to kernel-based profiling because of the variability associated with signal delivery latencies. Because of the aforementioned challenges, signal-based profiling is not as popular as kernel-based profiling.
What is needed, therefore, is a system and method that maintains the efficiency and accuracy of kernel-based profiling without requiring a large pinned buffer to store the histogram data.