1. Technical Field
The present invention relates to an improved data processing system and, in particular, to a method and apparatus for optimizing performance in a data processing system. Still more particularly, the present invention provides a method and apparatus for a software program development tool for enhancing performance of a software program through software profiling.
2. Description of Related Art
In analyzing and enhancing performance of a data processing system and the applications executing within the data processing system, it is helpful to know which software modules within a data processing system are using system resources. Effective management and enhancement of data processing systems requires knowing how and when various system resources are being used. Performance tools are used to monitor and examine a data processing system to determine resource consumption as various software applications are executing within the data processing system. For example, a performance tool may identify the most frequently executed modules and instructions in a data processing system, or may identify those modules which allocate the largest amount of memory or perform the most I/O requests. Hardware performance tools may be built into the system or added at a later point in time. Software performance tools also are useful in data processing systems, such as personal computer systems, which typically do not contain many, if any, built-in hardware performance tools.
One known software performance tool is a trace tool, which keeps track of particular sequences of instructions by logging certain events as they occur, so-called event-based profiling. For example, a trace tool may log every entry into, and every exit from, a module, subroutine, method, function, or system component. Alternately, a trace tool may log the requester and the amounts of memory allocated for each memory allocation request. Typically, a time stamped record is produced for each such event. Pairs of records similar to entry-exit records also are used to trace execution of arbitrary code segments, to record requesting and releasing locks, starting and completing I/O or data transmission, and for many other events of interest.
Another tool used involves program sampling to identify certain locations in programs in which the programs appear to spend large amounts of time, such as program hot spots. This technique is based on the idea of interrupting the application or data processing system execution at regular intervals, so-called sample-based profiling. In order to improve performance of code generated by various families of computers, it is often necessary to determine where time is being spent by the processor in executing code, such efforts being commonly known in the computer processing arts as locating xe2x80x9chot spots.xe2x80x9d Ideally, one would like to isolate such hot spots at the instruction and/or source line of code level in order to focus attention on areas which might benefit most from improvements to the code. At each interruption, the program counter of the currently executing thread, a process that is part of a larger process or program, is recorded. Typically, at post-processing time, these tools capture values that are resolved against a load map and symbol table information for the data processing system, and a profile of where the time is being spent is obtained from this analysis.
For example, isolating such hot spots to the instruction level permits compiler writers to find significant areas of suboptimal code generation, at which they may thus focus their efforts to improve code generation efficiency. Another potential use of instruction level detail is to provide guidance to the designer of future systems. Such designers employ profiling tools to find characteristic code sequences and/or single instructions that require optimization for the available software for a given type of hardware.
Event-based profiling has limitations. For example, event-based profiling is expensive in terms of performance (an event per entry and per exit), which can and often does perturb the resulting view of performance. Additionally, this technique is not always available because it requires the static or dynamic insertion of entry/exit events into the code. This insertion of events is sometimes not possible or is often difficult. For example, if source code is unavailable for the to-be-instrumented code, event-based profiling may not be feasible. However, it is possible to instrument an interpreter of the source code to obtain event-base profiling information without changing the source code.
On the other hand, sample-based profiling provides only a xe2x80x9cflat viewxe2x80x9d of system performance but does provide the benefits of reduced cost and reduced dependence on hooking-capability.
Further, sample-based techniques do not identify where the time is spent in many small and seemingly unrelated functions or in situations in which no clear hot spot is apparent. Without an understanding of the program structure, it is not clear with a xe2x80x9cflatxe2x80x9d profile how to determine where the performance improvements can be obtained.
Therefore, it would be advantageous to provide both event-based and sample-based profiling of an application within the same time period. It would be particularly advantageous to provide the ability to enable and disable profiling of selected portions of a data processing system and to combine the output from different types of profiling into a single merged presentation.
The present invention provides a process and system for profiling code executing on a data processing system. Event-based trace data is recorded in response to selected events, and the event-based trace data includes an indication which code is being interrupted. The trace data may be processed to identify a thread or method that was executing during the event. A periodically occurring event is also detected, and a call stack associated with the profiled code is identified in response to detection of the periodically occurring event, such as a timer interrupt. The call stack is examined to identify each routine that is currently executing during the periodically occurring event, and the trace data is recorded with the call stack information. The trace data from the recorded events and the trace data from the call stacks are processed to generate a tree structure in which the nodes indicate the call structure of the routine information from both the trace events and the call stacks.