Performance evaluation of software programs and revision of such programs to arrive at more efficiently executing programs are difficult and time consuming tasks. For instance, certain program performance problems are difficult to locate. In this regard, it may neither be convenient nor possible to analyze a program's performance using a real data set as the data set may either be unavailable or may be so large as to create a very long run. Further, it may be difficult to tell whether a code change has improved a program's performance or not, as the execution differences may either be subtle or may be offset by changes in an opposite direction by other code changes. Various software performance analysis tools have been made available to enable a program under test to be "dissected" in such a manner as to enable analysis of individual segments of the program listing.
A first type of performance analysis tool is represented by "PUMA" (offered by the Assignee of this application) which suspends a program's execution, periodically, takes samples of the program's current state values and records the samples in a file. Thereafter, the samples are analyzed and data is determined which enables the programmer to determine what fraction of the total elapsed time was spent in each segment. Thus, the PUMA tool enables performance data to be accumulated with respect to various selected segments of a program's listing.
A further performance analysis tool is a Unix utility entitled: "gprof". That tool inserts instructions into the executable code and then causes the program code to execute. Each time a subroutine in the code is called, the inserted instruction causes the count for that subroutine to be incremented. After the program has completed a run, the incremented count values are indicative of the number of times a subroutine has been called. These count values enable the programmer to select the "most active" portions of the program for analysis and optimization.
While the gprof utility does provide an indication of activity experienced by various subroutines and instructions in a program listing, it does not provide an indication of the relationship between a subroutine's execution time and the size of an input data set to the subroutine. For instance, it is known that certain subroutines exhibit execution times that are insensitive to input data set size. These subroutines are known as zero order (or "constant") routines. A subroutine which exhibits a relatively linear increase in execution time with data set size is known as a first order (or "linear") routine. A subroutine which exhibits an exponential increase in execution time with increases in data set size (where the exponent is approximately two), is known as a second order (or "quadratic") routine. Otherwise, a subroutine can be classified as an "ith" order routine, where i is the value of the exponent which fits the rate of increase of execution time, with input data set size, of the subroutine. Clearly, the larger the order of the subroutine, the more sensitive is its execution time to the size of the input data set. Accordingly, it is the larger order subroutines which should be optimized first by the programmer, before effort is applied to other subroutines whose execution times are less sensitive to input data set sizes.
Accordingly, there is a need for an improved method and apparatus for determining a relationship of subroutine execution time to input data set size. Further, the method should be operable without requiring use of full size data sets. Such a method will enable identification of those subroutines which are most sensitive to input data set size and allow a programmer to concentrate on optimization thereof.