1. Field of the Invention
The present invention relates generally to the data processing field and, more particularly, to a computer implemented method, system and computer usable program code for profiling the execution of an application.
2. Description of the Related Art
Calling context profiles are used in many interprocedural code optimizations and as an aid in overall program understanding. Collecting profile information, however, is highly intrusive due to the high frequency of method calls in most applications. Current calling context profiling mechanisms consequently suffer from low accuracy, high overhead, or both.
Given a trace containing all method calls and returns, calling context tree construction is relatively straightforward. Initially, a root node is added to the tree, and a cursor pointer is maintained that points to the current method context, initialized to the root node. If a method call is encountered, the node's children are compared to the new callee. If a matching child is found, the weight of the edge onto the child is incremented. If no child matches the callee, a new child is created. The cursor is then moved to the callee method. If a return is seen, the cursor is moved back one level to the parent. In the case of multi-threaded applications, a cursor is needed per thread.
Although this approach, generally referred to herein as the “exhaustive” approach, builds a complete calling context tree (CCT), the procedure suffers from severe performance degradation due to tracing overhead. Experiments have shown that tracing overhead can cause a very significant slowdown since each and every method call and return must be instrumented.
Sampled stack-walking is one alternative to the above-described “exhaustive” approach. Specifically, since a cursor pointer cannot be maintained across samples, the current context is determined at each sampling point by performing a stack-walk from the current method to the root method; and adding this path to the CCT if necessary. If the CCT already contains this path, the edge weight between the top two methods on the stack is incremented. Since the sampling rate can be controlled, profiling overhead can be easily minimized, however, this is achieved at the cost of accuracy.
In general, the accuracy of the sampled stack-walking approach suffers for two principal reasons. Initially, because individual method calls are not observed but are inferred, the collected CCT results may be inaccurate and misleading. For example, a program may spend most of its time executing within a single method. The sampled stack-walking approach, however, will assume that the method's caller is making frequent calls to the method because it is always on top of the stack. Consequently, the CCT obtained with this approach reflects execution time spent in each context more than the method invocation frequency of each context.
Secondly, increasing accuracy by increasing the sampling rate can be costly because of the generally high overhead of the interrupt mechanism to trigger a sampled stack-walking. Furthermore, supporting high sampling rates may not even be feasible on systems whose timer resolution is limited. As will be explained hereinafter, both the degree of overlap and the hot-edge coverage for the sampled stack-walking are typically below 50 percent.
It would, accordingly, be desirable to provide a mechanism for profiling the execution of an application that is both space- and time-efficient and highly accurate.