Branch prediction is one technique used to improve data processor performance. If the operands on which a branch instruction depends are not available, then a data processor must either predict the outcome of the branch instruction or must stall the branch instruction until the operands are available. If the data processor stalls, or delays executing the branch instruction, then it can not determine what instructions it needs next. Such a delay will significantly impact the performance of the data processor.
Data processors that use branch prediction techniques make a "guess" each time they receive a branch instruction, act on the guess, and then determine if the guess was correct by executing the instruction. Such a data processor guesses whether a branch will ultimately be taken and "jump" to a new instruction address or whether it will not be taken and "fall through" to the next sequential instruction. Data processors that predict branch instructions gain performance because they can make an accurate guess faster than they can fully execute the branch instruction. These data processors then need only correct wrong guesses.
Branch target address caches ("BTACs") are devices used to make branch predictions. BTACs contain addresses to which the data processor has recently branched. These "branch targets" are indexed by the address of the branch instruction which generated them. The data processor will search the BTAC once it determines the address of any instruction that it should execute. If the address corresponds to a valid entry in the BTAC, then the data processor assumes that the instruction is the same branch instruction and that it will take the branch again. Therefore, the data processor automatically branches to the corresponding cached target address. If the address does not correspond to any valid entry in the BTAC, then the data processor will determine the address of its next instruction by some other method. This other method may be another branch prediction technique or may be the actual execution of the branch instruction.
Subroutine return instructions are instructions which ruin the performance of BTACs. Subroutine return instructions are the second half of a subroutine call-subroutine return instruction pair. The subroutine call-subroutine return instruction pair, or its equivalent, is found in every modern computer architecture. A subroutine call instruction causes a data processor to branch to an address specified in the instruction (a subroutine) and to store the value of the instruction pointer or fetch address in a particular register or memory location. The contents of the instruction pointer or fetch address specifies the address of the next instruction that the data processor is to fetch from memory at any given time. In this case, the instruction pointer specifies the address of the instruction immediately following the subroutine call instruction. Conversely, the subroutine return instruction causes the data processor to branch to an instruction indexed by the address stored in the same particular register or memory location. The subroutine call and return instructions may be conditioned upon some particular bit value or may be unconditional.
Subroutine return instructions ruin the performance of BTACs because they do not branch to the same address repeatedly. Instead, subroutine return instructions branch to the instruction immediately following the instruction that initially called the subroutine. By definition, programmers form particular portions of a computer program into discrete subroutines because they call the subroutines from many points in a particular program. Consequently, BTACs often miss-predict each subroutine return instruction, speculatively jumping to the return address which the data processor generated at the end of a prior call. As a result, data processor designers often omit subroutine return instructions from their BTAC algorithms.