The present invention relates in general to methods for predicting branch target addresses in speculative instruction execution.
In some computer software, a xe2x80x9cbranch and linkxe2x80x9d instruction is executed to make a subroutine call. When a xe2x80x9cbranch and linkxe2x80x9d instruction executes, the address of the next instruction (where to return after the execution oft he branch) is placed in the link register and the execution starts from the target address of the branch instruction (the execution may continue on the fall through path, if the branch is a not-taken conditional branch). To return from the subroutine, a branch to link register (bclr) instruction is executed. This instruction has two forms, branch to link register (bclr) and branch to link register and link (bclrl). In this disclosure, these two forms are designated with the shortcut bclr[l] when both instructions are applicable. For nested sub-routine calls, the content of the link register is saved before making a new sub-routine call and restored after returning from that sub-routine.
Fetching an instruction precedes its execution by several machine cycles in high speed deeply pipelined microprocessors. Because of this, the content of the link register that should be used to start instruction fetching after a bclr[l] instruction has been found (and predicted taken) in the predicted path may not yet have the proper target address. Therefore, the content of the link register has to be predicted to keep instruction fetching far ahead of their execution.
If the link register is used only for subroutine calls and returns, a simple prediction mechanism may use a stack of link register values, called a xe2x80x9clink stackxe2x80x9d. When a bclr[l] instruction is found in the predicted path (and the predicted path is taken), the address of the next instruction is pushed (added in a specific order) into the link stack. For example the link stack pointer (address of link stack entry) is incremented by one on a xe2x80x9cpushxe2x80x9d operation. When a bclr[l] instruction is found in the predicted path (and predicted path is taken), the last address pushed into the link stack is popped (read from the stack in a specific order) and used as the new address to start instruction fetching. During a xe2x80x9cpopxe2x80x9d, the link stack pointer is decremented by one. This approach should work perfectly, except for the following cases which may leave the link stack corrupted:
1. In the C programming language, a xe2x80x9clong jumpxe2x80x9d instruction (usually used when an error case is detected) can cause the flow of instructions to skip several xe2x80x9csubroutine returnsxe2x80x9d from nested subroutine calls.
2. In many compilers, an optimization called xe2x80x9ctail recursionxe2x80x9d is used. A recursive subroutine call in which the last statement is a call to itself is called a xe2x80x9ctail recursionxe2x80x9d. For example, let a subroutine A call subroutine B where subroutine B is a tail recursive subroutine. Then let B call itself several times before reaching the leaf subroutine call. When the leaf subroutine returns, it does not have to go through the nested returns (through the chain of subroutine calls to B itself), rather, it can directly return to the instruction in A that follows the first call to B.
3. In many computer architectures (e.g., PowerPC), if the distance (number of instructions) between a branch instruction and its target address is above a certain limit, instructions such as bclr[l] or xe2x80x9cbranch to count registerxe2x80x9d (bcctr) instruction are used and the target address is stored in a register such as a Link Register (LR) or a Count Register (CTR) before the branch instruction executes. In this disclosure, xe2x80x9cbranch to count registerxe2x80x9d and xe2x80x9cbranch to count register and linkxe2x80x9d may be designated as bcctr[l]. The bclr[l] instruction is sometimes used for reasons other than subroutine returns, for example, in some compilers it is used to implement the switch statement (in C programming language) and computed GOTO (in Fortran). For such use of the bclr[l] instruction, the link register is updated using a xe2x80x9cmove to link registerxe2x80x9d (mtlr) instruction.
The branch to count register instruction is used in some compilers for generating code for switch statements and computed GoTo statements. It is also used in xe2x80x9cglue codexe2x80x9d for the target address of an indirect subroutine call or for the target address to a subroutine call when the calling subroutine and the subsequent called subroutine do not belong to the same compilation module.
In many cases, the target address of the bcctr[l] instruction is highly predictable. For example, studies have shown that the indirect subroutine calls in many object-oriented programs do call the same subroutine most of the time, making the address highly predictable by remembering the target address used by the instruction in its last execution. In some processors, the target address is stored in an area called xe2x80x9cCount Cachexe2x80x9d. Count Cache is a small cache memory used in the prediction of target addresses for bcctr[l] instructions.
Branch instructions are used extensively in computer code for a variety of software functions. Since these branch instructions are key to speculative instruction execution, the accuracy of branch prediction is very important to improving instruction execution time. There has been much work done to improve branch direction prediction, however not much work has been done to improve branch target prediction. Therefore there is a need for a method for improving branch target prediction accuracy for modern computer systems.
A bit field in branch instructions is reserved for Hint bits which are added to the branch instruction by the programmer or the compiler depending on the use (context) of the branch instruction in software code. When the branch instruction is later speculatively executed, these Hint bits are decoded and used to direct the hardware as to the source of information and actions to take to improve the accuracy of the speculative branch instruction execution. A branch instruction may have multiple uses in a software routine, and Hint bits are only used in those branch instructions where the programmer or the compiler knows that the Hint bits and the corresponding hardware actions will improve instruction execution. The Hint bits are used in branch to link register (bclr) and branch to count register (bcctr) instructions; however, Hint bits may be used in other branch instructions and still be within the scope of the present invention. The hardware actions minimize link stack corruption and improve execution time by providing better branch prediction.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.