This invention relates generally to branch prediction in a computer system, and more particularly to search area limiting in a branch target buffer for branch prediction in a processor.
Computer programs frequently contain subroutines that are used to perform specific tasks within the programs. Such subroutines are used to enable the reuse of code in a program and reduce code duplication. When a program is executing as a stream of instructions in a microprocessor, subroutines are typically handled by a branch instruction in the instruction stream to the start of a subroutine. The processor then executes the instructions in the subroutine, and at its conclusion, the subroutine contains another branch instruction to return to the sequential instruction of its caller. Prior to the conclusion of the subroutine, the subroutine may call additional subroutines to perform the task at hand. Since subroutines are frequently used in computer programs, optimizing this sequence can boost the performance of a program.
Modern high performance microprocessors contain logic that maintains a direction history of recently encountered branch instructions known as a branch history table (BHT). Many processors also contain a branch target buffer (BTB), which stores branch address and target address bits associated with a given branch. This mechanism can be used to enhance the performance of executing subroutines by predicting in advance of instruction decoding, where within the instruction address stream a branch will be that calls a subroutine, and what the target of the branch will be. As such, a redirection penalty from the branch to the target can be reduced to the point of eliminating the penalty all together. The redirection penalty refers to a reduction in cycles per instruction (CPI) performance that can occur when a processing pipeline stalls to wait for a target address to be resolved. Upon prediction of the subroutine, the branch prediction will continue the search for the next branch beginning with the first instruction, the branch target of which called the subroutine, within the subroutine. As searching is performed, instruction fetching may be initiated to predicted locations of branch targets.
The typical use of a BTB is to record prior branches in a table. The table, upon being given an address, is searched sequentially for the next branch within the instruction address as per a prior occurrence. The BTB typically contains 3 portions of addressing. Given a 64-bit address as an example, bits 50:59 can be used to index a multiple set-associative table, and within the table, tags associated with bits 40:49 & 60:63 are stored. Given a 4-set associative BTB, 4 branches per address region 50:59 can be stored in the table. Tags associated with bits 40:49 can be used to confirm a branch per the index. Tags associated with bits 60:63 define where the branch is located in 16-byte regions. Upon an initial index into the BTB, a branch is searched per the 4 entries at the given index. If a branch is not found, the index is increment by 1 and a sequential line containing another 4 branch entries of the BTB is searched. This process continues until a branch is found or a reset condition restarts the branch prediction.
When a subroutine is called there is no guarantee that should a branch not be found/predicted within the given subroutine, that the BTB will not continue sequentially searching beyond the end of the subroutine. If the return branch of the subroutine is not within the BTB, then there is no way for the BTB to predict the subroutine return branch. Within privileged regions of code it is important to limit branch prediction to only those branches that are confined to a given routine, as fetching beyond the limits of the given routine can alter the state of the processor. Predicting branches outside of the given routine can cause the processor to take actions (e.g., an unexpected fetch request) that are not supported as per the given state of the given subroutine. Such operations can corrupt the state of the processor. Without some form of branch prediction throttling, predictions must be completely prevented within privileged regions of code as to prevent a corrupted/illegal state which would cause the processor to get the wrong answer.
It would be desirable to limit the search area in the BTB to prevent missing a return branch and fetching instructions from a privileged region that can corrupt the state of the processor. Accordingly, there is a need in the art for an approach to perform search area confined branch prediction in a processor.