The present invention relates generally to software system performance diagnosis, and more particularly, to dynamic function level hardware performance profiling for application performance analysis.
In computer systems, hardware counters provide low-overhead access to a wealth of detailed performance information related to CPU's functional units, caches and main memory etc. The current off-the-shelf profilers featuring hardware events statistics are usually used for off-field analysis. They either collect as much information as possible in one execution (e.g., OProfile [oprofile]) with a large execution time overhead or require multiple runs to gradually localize root cause (e.g. Intel VTune[vtune]). In both cases, performance profiling is done for the complete trial runs, [oprofile] John Levon and Philippe Elie, OProfile: A system profiler for Linux, 2011, [vtune] Intel, VTune Amplifier, 2011.
For complex applications and long-running service programs, many performance bugs are the results of certain workload pressure or very specific input combination, and may render themselves only on certain production hardware specifications or system configurations. They are difficult to reproduce. Therefore, a run-time tracing tool is highly desirable. We list the features of such a tool as follows:                Enabling and disabling the tracing of hardware performance events and their association with function calls at any time during the execution of a target application.        Introducing low and controllable overhead.        Utilizing limited number of hardware performance counters to provide function-level and thread-aware hardware statistics.        
The current off-the-shelf profilers with hardware statistics such as OProfile and VTune [oprofile, vtune] are effective to inspect code execution. However, they do not consider the overhead of hardware statistics collection, which actually takes a lot of CPU cycles. Moreover, the hardware events information is system-wide without fine-grained tracing such as tracing each function. Lastly, they do not support run-time performance profiling for long-running service programs.
Accordingly, there is a need for a solution for guarding a monitoring scope and interpreting partial control flow context that is not taught hereto before.