A run-time optimizer is an adaptive software system that transparently optimizes applications at run-time. The optimizer rewrites the binary code of an application on-the-fly to achieve a higher execution efficiency.
FIG. 2 depicts prior art run time optimizer 20. The control loop 21 begins execution of a block of program code, via emulation performed by the profiling emulator 22. The profiling aspect of emulator 22 allows the control loop 21 to track the number of times the particular block of code has been executed via emulation. Note that a run time optimization system is different from a run time binary translation system, in that the latter is for architecture migration, while the former is to decrease execution time. The run time optimization system is using the emulator 22 for profiling in order to guide optimizations, i.e. the code is running on its native system. After a predetermined number of executions via emulation, the control loop 21 designates the block of code as hot code, and desirable for optimization. The control loop 21 then activates trace selector 23 to translate the block of code. The trace selector 23 forms a trace of the instructions that comprise the block of code by following the instructions in the block. When a branch instruction is encountered, the trace selector makes a prediction as to whether the branch is taken or falls through. If the selector decides the branch is mostly taken, then the trace is formed by extending the code from the branch target block. If the selector decides not to take the branch, then the branch falls through, and the trace continues within the fall through block. The trace terminates at a backward branch predicted to take or when the trace becomes sufficiently large. After the trace is completed, the code is rewritten with machine dependent and machine independent optimizations. The optimized code is then placed into the code cache 24. The next time the control loop 21 encounters a condition to execute this block of code, then the control loop 21 will execute the code in the code cache 24 and not emulate the code via emulator 22.
As shown in FIG. 3, if the target of a branch which is taken to exit trace 1, as shown by branch instruction 31, then control is returned to the run time system RTS 20 and to control loop 21, which determines if the target resides in the code cache. If the target resides in code cache, then the control loop 21 modifies the target of the branch instruction 31 to be the trace 2 in code cache as shown by branch instruction 33. This modification is called backpatching. Thus, if the exit of the trace is already translated, then the branch is backpatched such that a subsequent execution will directly branch to the new trace without returning to the control loop. Backpatching increases the speed of execution of the code, as returning to the RTS significantly slows down execution time.
A problem with the prior art RTS is that it cannot backpatch an indirect branch. The RTS cannot backpatch an indirect branch because the target address is unknown. The target address is typically in a register or memory location, and not written directly in code. Thus, the RTS will shift control back to the control loop 21 to determine whether the target address has been translated, which is expensive in terms of time. The prior art has attempted to minimize this problem by inlining a code sequence to search a smaller look up table in the optimized traces, however, these mechanism still incur high overhead. Moreover, if small table lookup fails then the RTS will shift control back to the control loop, as described above. Examples of indirect branches are return branches and switch branches. This software approach adds an additional 10-100s of cycles to the processing time.
Therefore, there is a need in the art for a RTS that can handle indirect branches without returning control to a control loop.