The present invention relates generally to a method and apparatus for profiling computer programs. In particular, one aspect of the present invention relates to a method for inserting additional instructions and data into existing relocatable object files of a computer program to collect very detailed and accurate profiling information. Another aspect of the present invention relates to a method for converting profiling information into image data for communication to a user/developer.
Profiling is a technique commonly used by software developers to gain information about the operation of their code. This information can then be used to improve and optimize the code. Profiling is done by processing the program under development with a profiler. The profiler adds code to the executable file so that it records various types of statistical data as the program runs. The type of data recorded varies with the profiler used.
One traditional and widely used profiler under UNIX is known as prof. Object code files are supplied to prof, which links the object files and adds monitoring routines to be executed at the beginning and end of the program. The initial routines set up a program sampler triggered by timing interrupts. The program sampler, which is invoked with a period of 1/100 of a second, records the value of the program counter register. The routines added to the end of the program take the saved data and create an output file showing the time apparently spent in the various routines; this output can be organized as a histogram.
This information can be useful to a programmer by indicating which routines consume the most execution time and are therefore the best places to focus on improvement. prof suffers from a number of limitations: it requires Operating System (“OS”) support for its interrupts; it shows where the execution is at the sampling times, but does not show “why” (does not identify the call chain that led to current function being called); and it relies on timed interrupts with a 1/100 second sampling period. The sampling period causes sampling errors because with today's fast processing speeds a great many on instructions can be executed in 1/100 of a second; many routines have total execution times shorter than this. The execution of short routines can go entirely unnoticed by prof, and sampling errors can cause significant inaccuracies in the measured execution times even for longer routines. The 1/100 sampling period could be theoretically shortened, but this has been found to add excessive overhead, so prof implementations generally do not allow “tuning” of the sampling rate. Furthermore, because the timing interrupts that trigger the sampling mechanism are not generated during OS function calls, OS function calls appear “free” under prof, whereas in actuality they (and therefore the program functions calling them) might be responsible for a primary portion of the total execution time.
Another commonly used profiler, known as gprof, relies on recompilation of source code to add more monitoring features than are present in prof. In addition to the interrupt sampling performed by prof, gprof adds code to the beginning of each function to record which function called it. A significant problem arises, however, if the program employs library code for which source files are unavailable. gprof is then unable to process the functions defined in these files, and their callers cannot then be recorded. When the execution sequence passes into one of these unmonitored functions, the call trace is severed. If the unmonitored function initiates calls that lead back into a monitored function, this calling of the monitored function will be disconnected from the remainder of the call trace (such situations are termed “spontaneous function calls”).
A simple program, in which start calls main, main calls function A, function A calls functions B and C, and function B calls function D, might result in gprof output statistics similar to those shown in Table I. The time column shows in seconds first the self time for the function, then the self+descendants time, which is the sum of the self time for the function of concern plus the self times of all the descendants of the function of concern. Typical gprof output tables would typically also include indications of how many times each function was called by each of its caller functions.
TABLE Iroutinecallerstimestart 0.1/101.1main(start) 1.0/101.0A(main) 10.0/100.0B(A)20.0/60.0C(A)30.0/30.0D(B)40.0/40.0
Typically, a programmer might then manually convert this table into a graph, such as shown in FIG. 1A, but this process is very complicated and painstaking for larger programs. To improve the ease of use of gprof output, some postprocessors are now available that can process the source code for a program to produce a static call graph and then overlay the gprof statistics on top of the static call graph. (In prior art systems the gprof statistics have been shown either by histograms next to each graph node, or by color coding of graph nodes.) The need for source code, however, is a serious drawback to such postprocessing, and significantly limits its usefulness.
Further limitations to the usefulness of gprof include those stemming from the use of timing interrupts as discussed above with reference to prof (sampling errors and inability to track OS calls), and poor handling of multiply called functions. For instance, suppose that in the above example, function D was called by both functions B and C, in which case the call graph would be as illustrated in FIG. 1B. The self time for function D would then have to be split up for allocation between the self+descendants times for functions B and C.
To make this allocation gprof relies upon an assumption of an average case distribution. That is, gprof assumes that all calls to a particular function require the same amount of time for completion. The self time for function D would then be allocated to the self+descendants times for functions B and C proportionally to the number of times that each of them called function D. However, because the arguments passed to a function can drastically effect the amount of time required to complete the function call, the average case distribution assumption employed by gprof can result in significant inaccuracies in reported times.
For example, if functions B and C both called function D 40 times, and the calls originated by function B consumed three fourths of function D's time (with the calls originated by function C consuming the remaining one fourth), the correct profiling statistics would be as shown in Table II. Because of its average case distribution assumption, however, gprof would allocate function D's time equally between functions B and C, and would prepare inaccurate profiling statistics as shown in Table III.
TABLE IIroutinecallerstimestart 0.1/101.1main(start) 1.0/101.0A(main) 10.0/100.0B(A)20.0/50.0C(A)30.0/40.0D(B)40.0/40.0
TABLE IIIroutinecallerstimestart 0.1/101.1main(start) 1.0/101.0A(main) 10.0/100.0B(A)20.0/40.0C(A)30.0/50.0D(B)40.0/40.0
Like prof, gprof concentrates on zero order profiling statistics. That is, timing statistics for a particular function are recorded with that function as the only reference point. gprof does not record any first order or other higher order timing distribution statistics, that is, timing statistics regarding a particular first function having been called by a particular second function. The only rudimentary first order statistics of any kind recorded by gprof are the number of times a called function was called by each of its caller functions. Because gprof cannot record first order timing statistics, however, gprof is forced to use the average case distribution assumption to calculate self+descendants time in many situations, as described above.
An improved profiler that remedies the deficiencies of prior art profiling techniques is clearly desirable.