In a pipelined processor, a program's instructions are processed sequentially through an instruction pipeline. The instruction pipeline is broken into stages which perform particular steps involved with an instruction's processing. This pipelined structure allows multiple instructions to be processed simultaneously, with each stage performing a different step of an instruction's processing.
Programs may contain conditional branch instructions. If a conditional branch is taken, the program will break from the current sequence of instructions to the target of the conditional branch; if a conditional branch is not taken, the program will continue the current sequence of instructions. Whether the conditional branch will be taken cannot be determined until later in the pipeline after the instruction has been decoded and executed. However, conditional branches may follow a predictable behavior, and whether a conditional branch will be taken, also known as the direction of the branch, may be predicted earlier in the instruction pipeline. A processor may speculatively fetch the next instruction in the program based on the predicted behavior of the branch. If the branch is predicted taken, the processor may fetch the next instruction from an address predicted by a branch target predictor. If the branch is predicted not taken, the processor may fetch the next instruction of the next sequential instruction address. The branch direction may be predicted before the instruction is decoded and may be checked at branch resolution once the instruction is executed. If the branch prediction was wrong, the fetched instructions are flushed from the instruction pipeline.
A dynamic branch prediction mechanism may use a history of branch outcomes to predict whether a branch instruction will be taken. Typically, a branch prediction system will contain a branch history and a predictor. The predictor is often a prediction history table of saturating counters that are indexed by the branch history and an instruction address, and output a branch prediction value. A prediction history table may also be known as a branch history table or pattern history table. The branch history may include local history or global history. Local history uses the history of each independent branch to index into a prediction history table. The local history for each branch may be stored as an entry in a local branch history table. Global history uses the combined history of all recent branches, rather than specific individual branches, and is often stored as a vector in a register. Global history may be combined with the instruction address to index into a prediction history table.
One common mechanism of global branch prediction involves storing global history as a global history vector (GHV) in a global history register (GHR). FIG. 1 is a diagram of a branch prediction mechanism that uses index sharing to select a counter in a prediction history table commonly known as gshare. A branch's instruction address 101 and a global history vector of the global history register 102 are combined through XOR logic 103 to form an index value. The index value indexes into a prediction history table 104. The prediction history table entry outputs a prediction of whether the branch is taken or not taken. The next instruction is fetched according to whether the branch is predicted as taken or not taken. Once the instruction is executed and the branch is resolved, the associated entry in the prediction history table is updated with the taken/not taken information for the branch. However, the global history register must be updated speculatively based on the predicted direction. If this is not done, subsequent branches would be predicted with the wrong global branch history, leading to poor predictions. The global history register may be updated by shifting a first logical value into the register if the branch is taken and a second logical value if the branch is not taken. Often, a branch must be predicted before the branch prediction table has been updated with the most recent branch prediction information.
For program execution, a processor may fetch a group of sequential instructions from an instruction cache, known as a fetch group. If a fetch group contains one or more taken branch, the instructions up through the first taken branch will be processed, after which the remaining instructions in the fetch group must be discarded and the pipeline flushed.