The present disclosure is generally directed to techniques for predicting a target address of an indirect branch instruction and, more specifically, to techniques for predicting a target address of an indirect branch instruction whose target address is correlated with a target address of a previous instance of the branch instruction.
In general, on-chip parallelism of a processor design may be increased through superscalar techniques that attempt to exploit instruction level parallelism (ILP) and/or through multithreading, which attempts to exploit thread level parallelism (TLP). Superscalar refers to executing multiple instructions at the same time, and multithreading refers to executing instructions from multiple threads within one processor chip at the same time. Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar processors with hardware multithreading. In general, SMT permits multiple independent threads of execution to better utilize resources provided by modern processor architectures. In SMT processor pipeline stages are time shared between active threads.
In computer science, a thread of execution (or thread) is usually the smallest sequence of programmed instructions that can be managed independently by an operating system (OS) scheduler. A thread is usually considered a light-weight process, and the implementation of threads and processes usually differs between OSs, but in most cases a thread is included within a process. Multiple threads can exist within the same process and share resources, e.g., memory, while different processes usually do not share resources. In a processor with multiple processor cores, each processor core may execute a separate thread simultaneously. In general, a kernel of an OS allows programmers to manipulate threads via a system call interface.
In computer architecture, a branch predictor is usually implemented as logic that predicts a direction of a branch instruction (branch) before the direction is actually known. The purpose of the branch predictor is to improve flow in an instruction pipeline. Two-way branching is usually implemented with a conditional jump instruction (conditional jump). A conditional jump can either be ‘not taken’ and continue execution with code that immediately follows the conditional jump or can be ‘taken’ and jump to a different location in program memory where a second branch of code is stored. Whether a conditional jump is ‘taken’ or ‘not taken’ is uncertain until a condition associated with the conditional jump is calculated and the conditional jump has passed the execution stage in the instruction pipeline. Without branch prediction, a processor would be required to wait until a conditional jump had passed an execute stage before a next instruction could enter a fetch stage of a processor pipeline. The branch predictor attempts to improve processor efficiency by predicting whether a conditional jump is ‘taken’ or ‘not taken’. The branch that is predicted is then fetched and speculatively executed. If the prediction is wrong then the speculatively executed or partially executed instructions are flushed from a processor pipeline and the pipeline starts over, incurring a delay.
The first time a conditional jump instruction is encountered there is little information on which to base a prediction. Branch predictors are usually configured to build a history of whether branches are ‘taken’ or ‘not taken’ to facilitate prediction. A branch predictor may, for example, recognize that a conditional jump is taken more often than not, or that the conditional jump is taken every ‘n’ times the conditional jump is encountered (where ‘n’ is equal to 2, 3, 4, . . . ). Branch prediction is not the same as branch target prediction. Branch prediction predicts whether a conditional jump will be ‘taken’ or ‘not taken’. Branch target prediction attempts to guess a target of a taken conditional or unconditional jump before it is computed by decoding and executing the jump. Branch prediction and branch target prediction are often combined into the same logic.
A branch instruction (branch) may be a direct branch or an indirect branch. For a direct branch, a target address (target) of the direct branch is calculated by adding/subtracting an immediate field of the direct branch to/from an address of the branch. In contrast, for an indirect branch a target is based on data in a register. As one example, an indirect branch that functions as a subroutine return may use a link register to provide an indirect branch target. When a calling instruction calls a subroutine the link register stores an address of an instruction that is to be executed following execution of the calling instruction. That is, the link register points to the instruction that is to be executed after a program returns from a currently executed subroutine. In a typical implementation, the link register is a single register that is updated on a subroutine call (e.g., a branch and link (BL) instruction or a branch conditional and link (BCL) instruction) with an address of an instruction following the branch.
If subroutine calls are nested then an address in the link register is saved before a next level of subroutine is called. For a return that is not at a last nesting level, software restores the link register before returning to a caller. In a typical implementation, a hardware link stack is implemented in which addresses are pushed from the link register on a subroutine call and pulled into the link register on a subroutine return. In this manner, the link register may be used (in conjunction with a link stack) to track all subroutines executed before a current subroutine. In the POWER™ instruction set architecture (ISA), some indirect branches use the count register (CTR) to provide an indirect branch target. Other indirect branches, e.g., subroutine returns (such as a branch conditional to link register (BCLR) instruction), are usually better predicted using the link stack.
In the POWER ISA, a branch conditional to count register (BCCTR) instruction has utilized the CTR to provide an indirect branch target. That is, the BCCTR instruction conditionally branches to an instruction specified by an address contained within the CTR. Branches that utilize the BCCTR instruction include switches, various calculated target tables, and other types of programmed indirection. A BCCTR instruction may be conditional or unconditional depending on a value specified in a BO field, which is a control field that determines, for example, whether the branch is based on the condition register (CR) and on what CR value to branch. For non-BCCTR instructions the BO field may be decremented and tested on the CTR (i.e., the CTR can be used for either a target address or a loop count).