1. Technical Field
The present invention relates to an improved data processing system and, in particular, to a method and apparatus for optimizing performance in a data processing system. Still more particularly, the present invention provides a method and apparatus for a software program development tool for enhancing performance of a software program through software profiling.
2. Description of Related Art
In analyzing and enhancing performance of a data processing system and the applications executing within the data processing system, it is helpful to know which software modules within a data processing system are using system resources. Effective management and enhancement of data processing systems requires knowing how and when various system resources are being used. Performance tools are used to monitor and examine a data processing system to determine resource consumption as various software applications are executing within the data processing system. For example, a performance tool may identify the most frequently executed modules and instructions in a data processing system, or may identify those modules, which allocate the largest amount of memory or perform the most I/O requests. Hardware performance tools may be built into the system or added at a later point in time. Software performance tools also are useful in data processing systems, such as personal computer systems, which typically do not contain many, if any, built-in hardware performance tools.
One known software performance tool is a trace tool. A trace tool may use more than one technique to provide trace information that indicates execution flows for an executing program. One technique keeps track of particular sequences of instructions by logging certain events as they occur, so-called event-based profiling technique. For example, a trace tool may log every entry into, and every exit from, a module, subroutine, method, function, or system component. Alternately, a trace tool may log the requester and the amounts of memory allocated for each memory allocation request. Typically, a time-stamped record is produced for each such event. Corresponding pairs of records similar to entry-exit records also are used to trace execution of arbitrary code segments, starting and completing I/O or data transmission, and for many other events of interest.
In order to improve performance of code generated by various families of computers, it is often necessary to determine where time is being spent by the processor in executing code, such efforts being commonly known in the computer processing arts as locating xe2x80x9chot spots.xe2x80x9d Ideally, one would like to isolate such hot spots at the instruction and/or source line of code level in order to focus attention on areas, which might benefit most from improvements to the code.
Another trace technique involves periodically sampling a program""s execution flows to identify certain locations in the program in which the program appears to spend large amounts of time. This technique is based on the idea of periodically interrupting the application or data processing system execution at regular intervals, so-called sample-based profiling. At each interruption, information is recorded for a predetermined length of time or for a predetermined number of events of interest. For example, the program counter of the currently executing thread, which is a process that is part of the larger program being profiled, may be recorded during the intervals. These values may be resolved against a load map and symbol table information for the data processing system at post-processing time, and a profile of where the time is being spent may be obtained from this analysis.
For example, isolating such hot spots to the instruction level permits compiler writers to find significant areas of suboptimal code generation at which they may thus focus their efforts to improve code generation efficiency. Another potential use of instruction level detail is to provide guidance to the designer of future systems. Such designers employ profiling tools to find characteristic code sequences and/or single instructions that require optimization for the available software for a given type of hardware.
Profiling information consists of two primary types of profiling metric variables, those which are updated based on a discrete event and those which are updated based on a non-discrete event. When profiling discrete metric variables, the value of the metric variable must be gathered from the component receiving the event. If profiling includes gather profiling information at the processor level, the value of discrete metric variable must be obtained from the processor. That value may be used to compute the change in the value of a metric variable that should be attributed to the process or method running on a specific processor. A change in the value of a non-discrete metric variable, on the other hand, does not rely on the occurrence of a discrete event. The value continues to update at a predetermined rate regardless of what events transpire during updating. However, the value of one processor""s non-discrete metric variable may be different from the value of another processor""s metric variable. For example, some symmetric multiprocessor (SMP) systems may not have synchronized clocks in which each processor has an identical value for the timing metric variable. Conversely, other SMP systems allow processor clocks that start synchronized to drift. Therefore, even though each processor may start with an identical value for a metric variable, the values may drift apart and become non-synchronized. Thus, the values of the non-synchronized metric variables must also be obtained from the individual processor receiving the event.
Generally, the values of metric variables are not considered to be synchronized between individual systems or machines. Therefore, a similar synchronization problem occur between a plurality of systems as occurs between processors within a single system. In addition, rarely are the values of metric variables in separate systems ever synchronized. Thus, the values of non-discrete metric variables must be obtained from the individual processor and system having the event even though the actual value of the non-discrete metric variable does not rely on the event for updating.
Therefore, it would be advantageous to provide a system in which accurate profiling information could be maintained globally for a plurality of processors rather than at the processor level. Further, it would be advantageous to provide a means to synchronize the value of metric variables between systems rather than at the system level.
A method and system for tracing profiling information using synchronized or non-synchronized metric variables with support across multiple systems using a global value of a metric variable. In one embodiment the value of non-discrete metric variables are synchronized at the processor level. When the profiler requests metric information for the non-discrete metric variables, the operating system kernel obtains the global metric value rather than a per-processor metric value for each processor. More particularly, if trace records are written, the change in the value of non-discrete metric variables can be derived from a single, global, value of the last metric variable which is set each time a record is processed. In another embodiment, the value of non-discrete metric variables are synchronized at the system level. In that case if trace records are written, the change in the value of non-discrete metric variables can be derived from a single value of the last metric variable which held for all synchronized systems and is set each time a record is processed for any system.