A processor is tasked with executing a large number of instructions and typically uses an instruction pipeline to increase instruction throughput. An instruction pipeline splits the processing of a computer instruction into a series of independent steps and stores the result at the end of each step. To be able to process the independent steps, an instruction pipeline includes several stages for processing instructions. In one example, a four-stage pipeline may be used, which includes a fetch stage, a decode stage, an execution stage, and a write-back stage. Instructions progress through the pipeline stages in order. For example, each of the independent steps of an instruction will be at the fetch stage at a first time, at the decode stage at a second time, in the execution stage at a third time, and at the write-back stage during a fourth time.
To accelerate processor operations, it is desirable to have as many instructions as possible in the pipeline at the same time. One way of increasing the number of instructions in the pipeline is to fetch subsequent instructions while previous instructions are still being processed in the pipeline. Fetching subsequent instructions may be referred to as “fetching ahead.” Problems may arise with fetching ahead because the result of the execution of particular previous instructions may be necessary for the execution of the subsequent instructions that are fetched ahead of time. For example, an instruction may include a “branch,” which is typically an “if-then-else” structure that creates a conditional jump. At the time of a conditional jump, it must be determined, based on one or more factors or conditions, whether the jump should be “taken” or “not-taken.” The decision at that point in time creates two possible branches, referred to as “taken” or “not-taken” branches. The reason that the decision to take or not take the branch cannot be determined at that particular point in time is because the decision may depend on the result of one or more instructions that are still in the pipeline. Thus, many branch decisions will need to be made before the actual answer is computed by the processor. Waiting for the result of all previous instructions would delay execution of instructions and would eliminate the benefits of using an instruction pipeline.
Therefore, it is desirable to predict whether a branch is taken or not-taken to avoid the delay associated with waiting for the actual branch decision. If the branch prediction is correct, the instruction pipeline may continue normally. If the branch prediction is incorrect, many of the instructions in the instruction pipeline will be using incorrect information. Thus, in the event a branch “misprediction” is discovered, at least a part of the pipeline must be emptied (referred to as a “flush”). Specifically, the instructions that have entered the pipeline more recently than the mispredicted branch must be flushed. Branch predictors were created as a way to make the branch prediction in an educated manner. A branch predictor predicts the direction of a branch instruction (taken or not-taken) and the branch target address before the branch instruction reaches the execution stage in the pipeline.
Branch prediction results in fetching an instruction based on the predicted direction of the branch because a different set of instructions will need to be executed depending on which branch direction is chosen. It may not be determined whether the correct instruction was fetched until the branch instruction reaches the execution stage. However, the determination of which instruction to fetch must be decided at the fetch stage, which occurs before the execution stage. Fetching an instruction before knowing exactly which instruction needs to be executed is called “pre-fetching.” Executing an instruction ahead of time based on a branch prediction that may or may not be correct is called “speculatively executing” the instruction. The instruction is considered to be speculatively executed because, at that particular time, it is not known whether the prediction is correct and whether the correct instruction was executed.
Although pre-fetching and speculatively executing instructions without knowing the actual direction of the branch instruction may result in accelerating instruction processing if predicted correctly, it may have the opposite effect and may result in stalling the pipeline if the branch direction is mispredicted. If a branch misprediction occurs, the instruction pipeline needs to be flushed and the instructions from the correct branch direction need to be executed. This may severely impact the performance of the processor.
In attempts to increase the performance of processors, several different types of branch predictors are used. A local branch predictor makes a prediction based on the recent history of a particular conditional jump, and provides a prediction of taken or not-taken. A global branch predictor makes a prediction based upon the recent history of all conditional jumps, not just a particular jump of interest. To make a prediction, a global branch predictor keeps a shared history of all conditional jumps, called global history.
A saturating counter may also be used to increase the effectiveness of a branch predictor. A saturating counter is a state machine with four states. For example, the four states may include “strongly not taken,” “weakly not taken,” “weakly taken,” and “strongly taken.” A state machine with four states requires 2 bits to maintain the four states and is considered a “2-bit saturating counter. A saturating counter may be used for each branch and when the branch is evaluated, the state machine is updated. For example, if a branch is evaluated as “not taken,” the state is decremented towards the “strongly not taken” state. Similarly, if a branch is evaluated as “taken,” the state is incremented towards the “strongly taken” state. Thus, a saturating counter in the “strongly taken” state will only decrement to “weakly taken” when a not-taken branch is evaluated. In this way, a particular branch must deviate twice from what it has done most in recent history before the prediction changes. In the example described above, the next prediction will be “taken.” However, if the next evaluation is another not-taken branch, the state will be changed to “weakly not taken” and the following prediction will be “not taken.”
A two-level adaptive predictor with a globally shared history buffer, a pattern history table (PHT), and/or an additional local saturating counter may also be used to further increase the performance of a branch predictor. The two-level adaptive predictor may increase performance of the processor if conditional jumps are taken according to a regularly occurring pattern. The two-level adaptive predictor maintains a branch history of the last n outcomes of one or more branches and uses a saturating counter for each of the 2n possible branch history patterns. For example, if the last 2 outcomes of a branch are maintained, there are 4 possible binary representations of the last 2 outcomes: 00, 01, 10, or 11. The branch history may be stored in a 2-bit shift register that may be updated each time a new branch outcome is evaluated. In this example, the PHT has 4 entries, one for each of the 4 possible branch history outcomes (00, 01, 10, or 11), and each entry contains a saturating counter that provides a branch prediction based on the outcomes. To access the PHT, a particular saturating counter is selected from the PHT that corresponds to the value stored in the branch history shift register. Thus, the branch prediction is made based on a particular combination of the recent branch history and not simply based on the last branch that was evaluated.
An example of the advantages of this approach can be seen if the correct branch direction alternates between taken and not-taken each time. In that case, a single branch predictor may guess incorrectly every time. However, a consideration of the recent branch history and a saturated counter pertaining to that particular history may allow such a pattern to be correctly predicted. For example, if the branch direction alternates each time, the recent history would be represented as “01010101 . . . ” Because the pattern continues to alternate, the saturating counter in the PHT corresponding to a history of “01” would indicate that the next branch is “strongly not taken” because the branch direction following “01” has always been “0.” Similarly, a history of “10” would indicate that the next branch is “strongly taken” because the branch direction following “10” has always been “1.” If this pattern persists, there will be no entries related to a history of “00” or “11” because those histories do not exist in the alternating branch direction scenario used in this example.
The branch prediction mechanisms described above may be used alone or may be used in any combination simultaneously. For example, if more than one branch predictor is used, a final prediction may be made either based on a meta-predictor that remembers which of the predictors made the best predictions in the past or based on a majority vote among an odd number of different branch predictors.
Branch predictors are typically large and complex structures. As a result, they consume a large amount of power and incur a latency penalty when predicting branches. Thus, it would be desirable to further increase the effectiveness of branch predictors, because better branch prediction has an impact on the performance and the power efficiency of the processor.