1. Technical Field
The present invention relates to program profiling and more particularly to a system and method for profiling resource use with correlated hardware events.
2. Description of the Related Art
Profiling is an essential and widely used technique to understand the resource use of applications. While efficient tools for the profiling of execution time are available, the choices for detailed profiling of other hardware resources, e.g. cache misses or memory use are very limited or non-existent.
There is a rich body of literature on sampling and profiling. Typical time profilers sample periodically the sub-routine or code line and instrument each function call. The function call counts are used to reconstruct a call graph. Location based sampling can provide very misleading results. One approach gathers more context by keeping track of the call history. While providing more accurate results, it requires the instrumentation of all function calls analogous to Gprof and book-keeping to track the history, which causes significant overhead. Gprof is an open source profiling program known in the art. Using Gprof often increases execution time by a factor of two or more.
The memory profiling tools found in the art can be classified into several categories. There are tools with little impact on execution time and memory use which sample the so-called high water mark over time (e.g., the Unix tool top). This is clearly insufficient to localize bottlenecks in large applications. The strategy of sampling at periodic time intervals is not applicable to profiling of memory use by code location or call chain. The performance impact of sampling the state of the heap is too costly and significant errors are possible due to the lack of correlation between memory use and execution time. An application can allocate and deallocate a significant amount of memory in a very short period of time.
Another category of tools records every allocation and deallocation. If only the code location is sampled, the results can be misleading. Furthermore, the execution time can be several times larger than for an unprofiled run. The recording of entire call chains improves the profiling information, but causes even more overhead. This category provides correct attribution of memory use to call chains, but the memory use analysis is only possible at singular snapshots during the program. It is difficult to hit the interesting spots in large applications that run for many hours. The allocation/deallocation category often increases the runtime of a program by an order of magnitude and also increases the memory use of the application by a factor of two or more rendering it useless for very large applications.
“Rational Purify” shows similar increases in execution time and memory usage. These tools investigate memory access errors and other problems in small and short program runs, but they are not well suited for the investigation of the memory use of application runs that already require significant time and which already push the memory capacity of a system to its limit without profiling.
Another common approach is simulation based on code or program traces, which allows detailed analysis of applications but causes far too much execution time overhead for the performance analysis of very large applications. A low overhead sampling approach for memory leak detection has been proposed for sampling code segments adaptively at a rate inversely proportional to their execution frequency. This is not applicable for memory usage due to the lack of correlation of memory usage with execution time or execution frequency. While memory leaks persist after creation, memory allocations have a limited life time.