Current profiling techniques are good at informing programmers where time is being spent during program execution. Many programmers use hierarchical and path profiling tools to identify the basic blocks and procedures that account for most of the program's execution time. (See, e.g., J. Gosling, B. Joy, and G. Steele, “Hiprof Advanced Code Performance Analysis Through Hierarchical Profiling”; Thomas Ball and James R. Larus, “Efficient Path Profiling,” International Symposium on Microarchitecture, pages 46-57, 1996; James R. Larus, “Whole Program Paths,” Proceedings Of The ACM SIGPLAN 1999 Conference On Programming Language Design And Implementation, pages 259-269. ACM Press, 1999; Glenn Ammons and James R. Larus, “Improving Data-Flow Analysis With Path Profiles,” SIGPLAN Conference on Programming Language Design and Implementation, pages 72-84, 1998; David Melski and Thomas W. Reps, “Interprocedural Path Profiling,” Computational Complexity, pages 47-62, 1999; and JProfiler profiling tool available from ej-technologies Gmbh.)
In addition, hardware performance counter based profiling tools (like VTune performance analyzer available from Intel Corporation) are used to determine program regions that incur a large number of cache misses and branch mispredictions. These tools allow programmers to attempt to speedup program execution by focusing on the frequently executed loops as well as those that account for the most misses. Unfortunately, these profiling techniques lack the ability to direct a programmer to the program section that has the largest speedup opportunity for optimization. For example, a profiling tool may report that the largest fraction of execution time is spent in an inner loop of a matrix multiply routine. This information is unlikely to be very useful for speeding up the program as it is likely that this routine has already been heavily optimized.