1. Field of the Invention
The present invention relates to an improved data processing system and, in particular, to a method and apparatus for optimizing performance in a data processing system. Still more particularly, the present invention provides a method and apparatus for a software program development tool for enhancing performance of a software program through performance profiling.
2. Description of Related Art
Effective management and enhancement of data processing systems requires knowing how and when various system resources are being used. In analyzing and enhancing performance of a data processing system and the applications executing within the data processing system, it is helpful to know which software modules within a data processing system are using system resources. Performance tools are used to monitor and examine a data processing system to determine resource consumption as various software applications are executing within the data processing system. For example, a performance tool may identify the most frequently executed modules and instructions in a data processing system, may identify those modules which allocate the largest amount of memory, or may identify those modules which perform the most I/O requests. Hardware-based performance tools may be built into the system and, in some cases, may be installed at a later time, while software-based performance tools may generally be added to a data processing system at any time.
In order to improve performance of program code, it is often necessary to determine how time is spent by the processor in executing code, such efforts being commonly known in the computer processing arts as locating xe2x80x9chot spots.xe2x80x9d Ideally, one would like to isolate such hot spots at various levels of granularity in order to focus attention on code which would most benefit from improvements.
For example, isolating such hot spots to the instruction level permits compiler developers to find significant areas of suboptimal code generation at which they may focus their efforts to improve code generation efficiency. Another potential use of instruction level detail is to provide guidance to the designer of future systems. Such designers employ profiling tools to find characteristic code sequences and/or single instructions that require optimization for the available software for a given type of hardware.
Most software engineers are more concerned about the efficiency of applications at higher levels of granularity, such as the source code statement level or source code module level. For example, if a software engineer can determine that a particular module requires significant amounts of time to execute, the software engineer can make an effort to increase the performance of that particular module. In addition, if a software engineer can determine that a particular module is sometimes invoked unnecessarily, then the software engineer can rewrite other portions of code to eliminate unnecessary module executions.
Various hardware-based performance tools are available to professional software developers. Within state-of-the-art processors, facilities are often provided which enable the processor to count occurrences of software-selectable events and to time the execution of processes within an associated data processing system. Collectively, these facilities are generally known as the performance monitor of the processor. Performance monitoring is often used to optimize the use of software in a system.
A performance monitor is generally regarded as a facility incorporated into a processor to monitor selected operational characteristics of a hardware environment to assist in the debugging and analyzing of systems by determining a machine""s state at a particular point in time or over a period of time. Often, the performance monitor produces information relating to the utilization of a processor""s instruction execution and storage control. For example, the performance monitor can be utilized to provide information regarding the amount of time that has passed between events in a processing system. As another example, software engineers may utilize timing data from the performance monitor to optimize programs by relocating branch instructions and memory accesses. In addition, the performance monitor may be utilized to gather data about the access times to the data processing system""s L1 cache, L2 cache, and main memory. Utilizing this data, system designers may identify performance bottlenecks specific to particular software or hardware environments. The information produced usually guides system designers toward ways of enhancing performance of a given system or of developing improvements in the design of a new system.
To obtain performance information, events within the data processing system are counted by one or more counters within the performance monitor. The operation of such counters is managed by control registers, which are comprised of a plurality of bit fields. In general, both control registers and the counters are readable and writable by software. Thus, by writing values to the control register, a user may select the events within the data processing system to be monitored and specify the conditions under which the counters are enabled.
In addition to the hardware-based performance tools and techniques discussed above, software-based performance tools may also be deployed in a variety of manners. One known software-based performance tool is a trace tool. A trace tool may use more than one technique to provide trace information that indicates execution flows for an executing application. One technique keeps track of particular sequences of instructions by logging certain events as they occur, so-called event-based profiling technique. For example, a trace tool may log every entry and corresponding exit into and from a module, subroutine, method, function, or system component within the executing application. Typically, a time-stamped record is produced for each such event. Corresponding pairs of records similar to entry-exit records may also be used to trace execution of arbitrary code segments, starting and completing I/O or data transmission, and for many other events of interest. Output from a trace tool may be analyzed in many ways to identify a variety of performance metrics with respect to the execution environment.
However, instrumenting an application to enable trace profiling may undesirably disturb the execution of the application. As the application executes, the instrumentation code may incur significant overhead, such as system calls to obtain a current timestamp or other execution state information. In fact, the CPU time consumed by the instrumentation code alone may effect the resulting performance metric of the application being studied.
Another problem in profiling an application is that unwanted and/or unexpected effects may be caused by the system environment to the state information that the profiling processes are attempting to capture. Since most computer systems are interruptable, multi-tasking systems, the operating system may perform certain actions underneath the profiling processes, unbeknownst to the profiling processes. The most prevalent of these actions is a thread-switch.
While a profiling process is attempting to capture state information about a particular thread, i.e., a thread-relative metric, the system may perform a thread switch. Once the profiling process obtains the state information, the information may have changed due to the thread switch. The subsequent execution of the other thread may render the desired, thread-relative, performance metric partially skewed or completely unreliable. If the profiling process does not perform any actions to determine the prevalence of thread switches, the profiling process cannot determine the reliability of certain performance metrics.
Significant efforts in the prior art have been employed to minimize or negate the effects of thread switches on performance metrics captured for profiling purposes. Typical solutions to tracking performance information on a per-thread basis require significant kernel support because of the need to identify thread dispatch events initiated by the kernel and then to associate the consumed performance metric with these events so that an accurate accounting of resource consumption on each thread can be obtained. Infrastructure to support this kind of accounting can be quite expensive, and in some cases, such as environments that lack an open, extensible, kernel programming interface, the infrastructure may be difficult or impossible to deploy.
Therefore, it would be advantageous to provide a method and system for isolating the thread-relative performance metrics from the effects caused by thread-switching, thereby enhancing the accuracy of thread-relative metrics captured by the profiling processes. It would be particularly advantageous to provide a mechanism that employs the hardware-based efficiencies provided by the performance monitoring facilities of a processor.
A method, system, apparatus, or computer program product is presented for low-overhead performance measurement of an application executing in a data processing system in order to generate per-thread performance information in a multithreaded environment. While a first set of events is being gathered or monitored for a particular thread as a first metric, events that may indirectly cause inaccuracies in the first metric are also monitored as a second metric, and the presence of a positive value for the second metric is then used to determine that the first metric is inaccurate or unreliable. If the first metric is deemed inaccurate, then the first metric may be discarded. When the first metric is gathered without the occurrence of an event being counted as the second metric, then the first metric is considered accurate. The first metric may then be considered a xe2x80x9cthread-relativexe2x80x9d metric as it has been observed during a time period in which no events would have caused the first metric to become inaccurate during the execution of a particular thread. For example, the first metric could be a value of a consumed resource, such as a number of executed instructions, while the second metric is a number of interrupts, each of which might cause the kernel to initiate a thread switch. While the interrupt is serviced, the first metric continues to monitor resource consumption, yet the second metric indicates that the first metric is inaccurate with respect to a particular thread.
Within an executing application, a first counter is read to get a first performance metric beginning value, and a second counter is read to get a second performance metric beginning value. These values may be obtained near the entry of a method with which an analyst desires to observe its execution performance. The application then continues to execute. Near the exit of the method, the first counter is read to get a first performance metric ending value, and the second counter is read to get a second performance metric ending value. A first performance metric delta value is then computed from the first performance metric beginning value and the first performance metric ending value, and a second performance metric delta value is computed from the second performance metric beginning value and the second performance metric ending value. A determination can then be made as to whether the first performance metric delta value is reliable as a thread-relative performance metric value representing the first performance metric based upon whether the second performance metric delta value is zero.
In one type of performance observation, the first counter and the second counter are performance monitor counters in a processor in the data processing system, and the performance monitor counters are read directly by application-level code.