Determining the more frequently executed portions of a program is often done through a process known as profiling. Profile-guided optimization is a compiler technique that, based on the profile feedback, selects a portion of a program as important, and optimizes that hot portion of the program aggressively, possibly at the expense of the less important, or cold, portion of the program.
Three types of profiling are generally used, heuristic, static and dynamic. A heuristic profiling technique profiles a program without ever executing the program. A prime example of heuristic techniques is described by Thomas Ball and James Larus in a paper entitled “Branch Prediction for Free,” PLDI 1993. Ball and Larus provide an all software approach. In their paper, Ball and Larus describe a number of heuristics they may apply to branches in a program's code to predict whether the branch will be taken. These heuristics include, for example, a prediction (yes or no) that a comparison of a pointer against a null in an If statement will fail. Based on these binary branch predictions, a compiler can estimate what portions of the program are most likely to be executed. A drawback of heuristic profiling technique is its inaccuracy in predicting program behavior.
The static profile used by profile-guided optimization is often obtained from running the program once with a given set of input. One serious problem with profile-guided optimization using static profile is that when the program later runs with a different input set, the profile may have been altered. Furthermore, a program may have phase behavior and the static profile only captures the average behavior. As a result, the optimization benefit from the static profile may diminish.
Dynamic profiling technique collects the program profiles continuously as the program is executed. Dynamic profiling technique may be implemented in all hardware or all software. A problem with software only approaches is that the inserted profiling instructions compete with the user program for machine resources (e.g. the architectural registers) and they impose dependence that may lengthen the critical paths in the user program.
All hardware approaches for identifying the set of hot branches suffer as well. For example, the method used by Merten et al. [Matthew C. Merten, Andrew R. Trick, Christopher N. George, John C. Gyllenhaal, and Wen-mei W. Hwu, “A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization,” Proceedings of the 26th International Symposium on Computer Architecture, May 1999] cannot detect program phase transition and may repeatedly discover the same hot regions. Furthermore, the profiles collected by using this technique are not complete and many edges in a hot region may be missed due to cache conflict and the lack of backup storage.
Sampling based profiling technique proposed by Zhang et al. [Xiaolan Zhang, Zheng Wang, Nicholas Gloy, J. Bradley Chen, and Michael D. Smith. “System Support for Automated Profiling and Optimization,” 16th ACM Symposium on Operating System Principles, Oct. 5–8, 1997] may also be used to collect dynamic profile. This technique interrupts user program at sample intervals (e.g. every 100K instructions) and dumps the current program counter to a buffer. This technique requires operating system support, and the dumped information needs post-processing to determine the relative weights of the basic blocks. Furthermore, this technique cannot automatically detect program phase transition.