Most dynamic binary translators such as IA32EL, Transmeta, Daisy, Dynamo, BOA, and ARIES use a two-phase approach to identify and improve frequently executed code dynamically. In the first step, the profiling phase, blocks of code are interpreted or translated without optimization to collect execution frequency information for the blocks. In the second phase, the optimization phase, frequently executed blocks are grouped into regions, including loop regions, and advanced optimizations are applied on them. For example, the profiling phase in Intel® Corporation's IA32EL converts each IA32 block quickly into Itanium® Processor Family code with instrumentation for collecting the block's “use” count, the number of times the block is visited, and the block's “taken” count, the number of times its conditional branch is taken. When the use count for a block reaches a retranslation threshold, the block is registered in a pool of candidate blocks. When a sufficient number of blocks are registered or when a block is registered twice, the optimization phase begins to retranslate the candidate blocks. The optimization phase uses the ratio taken/use as the branch probability to form regions for optimizations and instruction scheduling. Some optimizations may also use the taken/use values to determine a loop's trip count (the number of times the loop body is executed each time the loop is entered).
The profiling phase cannot be very long, or the benefit of the optimizations will be reduced as the code is executed without optimization for a prolonged period. Typical retranslation thresholds are usually small, ranging from tens to a few thousand.
This approach implicitly assumes that the execution profile of each block in the profiling phase is representative of the block throughout its lifetime. In particular, the trip count information derived from the block information is assumed to be representative of the behavior of the loop during all phases of execution, including late execution. However, if the trip count information collected during the profiling phase is not representative of the loop behavior in later stages of execution, a loop may be improperly optimized and program performance will suffer.
Static profiling techniques are able to obtain accurate trip count information but they accomplish this via a separate training execution of the program that obtains a full program profile of loop trip counts. Static profiling techniques are not as adaptable to optimizing programs that have varied sets of input for each instance the program is executed. Dynamic profiling methods can adapt but they rely on limited runtime techniques utilizing the initial profiling phase to determine loop trip counts. Consequently, the trip count information used for dynamic optimizations is often inaccurate because the initial profile is significantly smaller and less representative of the full program execution than the training input obtained from static profiling.
Thus, there is a need for an effective method to continuously profile loop trip counts and to use the profile results to dynamically optimize the loop throughout the life of the program.