1. Field of the Invention
The present invention relates generally to computer system performance and tuning. More particularly, the present invention is a system and method for performance monitoring of application code to identify bottlenecks so that they may be reduced or eliminated.
2. Description of Related Art
For real time applications and most CPU intensive applications, performance is a key factor. Performance improvements for such applications provide two (2) major benefits. First, performance improvements allow the application to run faster. Secondly, performance improvements have an impact on the bottom line. If an application uses fewer CPU cycles, it completes more quickly so that the same processor may support executions of other portions of the application. As a result, fewer processors are required to execute the application. If fewer processors are required, fewer human resources are needed to operate a system.
In order to introduce performance improvements into an application, it is necessary to determine the performance bottlenecks. The most obvious (and most painful) way to determine the performance bottlenecks is to analyze the source code directly. An extremely skilled programmer can make fairly substantial gains with this mechanism. However, when the application consists of hundreds, thousands, or millions of lines of code, this solution simply isn""t practical.
Some operating system software vendors provide performance monitoring utilities or tools with their operating system software. For example, Sun Microsystem""s Solaris operating system provides a tool called prof to assist with application code performance monitoring. The prof tool provides performance information meaning it provides performance data at various points during execution of the application. The prof tool xe2x80x98instrumentsxe2x80x99 the object code directly. The prof tool steps through all of the object code and adds new lines of code to the application that allow it to collect performance data. So to monitor an application currently in production, a new application containing the prof data must be delivered. Under most situations, this solution is impractical and is not viable.
Other tools may be used to collect and report data, but they require substantial preparation to be used with an application. For example, the quantify tool developed and sold by Pure Software provides accurate data, but also requires a re-link of the application code. Due to the re-linking requirement, it is impossible to use the quantify tool to measure production application code.
Another tool that requires preparation is the dbx tool provided with many versions of the UNIX operating system. Dbx is a debugging tool that may be used to collect performance data. The performance data collected may be fairly accurate. However, dbx must be installed on the run-time environment. It works only with non-stripped executables and run time libraries. Consequently, it also cannot be used with production application code because most production code is stripped.
Other tools or utilities that may be used with production application code do not provide useful or meaningful performance analysis data. The ps utility, provided with many versions of the UNIX operating system, tracks and reports to a user the total amount of CPU used per process. It does not, however, tell a user where in the application the CPU cycles are being expended. The ps utility simply does not provide enough meaningful information to allow an application developer to determine the bottlenecks in the application.
Another utility provided with many versions of the UNIX operating system is pstack. It is a point-in-time utility that tells a user the calling stack of any given process. Using pstack and ps together may falsely identify bottlenecks. The shortcomings of the pstack and ps utilities may be demonstrated using the following sample application.
int tx1( )
{
int i=0;
char tx[100];
for(i=0;i less than 10000;i++) {
for(int j=0;j less than 100;j++)
sprintf(tx,xe2x80x9c%ldxe2x80x9d,j);
i++;
}
return(0);
}
int tx2( )
{
sleep(5);
return(0);
}
main( )
{
while(1) {
tx1( );
tx2( );
}
}
When applied to the sample application code, pstack produces the following output.
/usr/proc/bin/pstack 2054
2054: doit
ef638b5c sigsuspend (efffd7f8)
ef638b5c _libc_sigsuspend (efffd7f8, e, efffd808, ef6a571c, ef6a2e54, ef64dca8) +4
ef64dce4 _libc_sleep (5, 0, 0, 0, ef6a2e54, 10d7c)+f0
00010d7c xe2x80x940FDtx2v (0, ef6a8c04, ef7571e0, 2, ef6a2e54, ef618ca8)+c
00010c5c main (1, efffd99c, efffd9a4, 21000, 0, 0)+c
00010c24 _start (0, 0, 0, 0, 0, 0)+dc
If pstack is run continuously on the sample application, most of the time it would show the application as spending most of its time in the sleep system call. The application developer would then incorrectly accredit most of the CPU cycles to the sleep system call rather than another part of the application.
If the application developer could continuously (and quickly) perform the ps command and the pstack command and correlate the results, he or she might obtain some useful performance data. However, the largest problem in obtaining useful data is speed. If the developer cannot perform these commands fast enough, the results would be suspect. The developer would need to perform at least 100 of these per second (10 ms gap). The inability to obtain meaningful and useful performance data on production application code makes it difficult, if not impossible, to tune or optimize production application code. Therefore, there is a need for a performance monitoring utility that may be used on production code and that provides meaningful and useful performance data.
The present inventionxe2x80x94jTracexe2x80x94addresses the shortcomings of the prior art performance monitoring tools and utilities. The present invention is a performance monitoring tool that can be used, without any preparation work, to actively measure an application. It provides accurate data to determine application code bottlenecks. The information regarding the code bottlenecks may then be used by an application developer to modify the code to reduce or eliminate bottlenecks. Unlike many prior art tools and utilities, it may be used on both stripped and non-stripped executables and run-time libraries.
The present invention calls a custom function that returns the calling stack and a custom function to determine CPU information. These functions execute in a loop over and over again and the results are maintained in a binary tree. Information from the binary tree is then used to obtain periodic snapshots of the executing code that may be viewed by an application developer to determine the code bottlenecks. After determining the code bottlenecks, the application developer can modify the code to reduce or eliminate the bottlenecks.