I. Field of the Disclosure
The technology of the disclosure relates generally to branch prediction in computer systems.
II. Background
Instruction pipelining is a processing technique whereby the throughput of computer instructions being executed by a processor may be increased by splitting the handling of each instruction into a series of steps. These steps are executed in an execution pipeline composed of multiple stages. Optimal processor performance may be achieved if all stages in an execution pipeline are able to process instructions concurrently. However, concurrent execution of instructions in an execution pipeline may be hampered by the presence of conditional branch instructions. Conditional branch instructions may redirect the flow of a program based on conditions evaluated when the conditional branch instructions are executed. As a result, the processor may have to stall the fetching of additional instructions until a conditional branch instruction has executed, resulting in reduced processor performance and increased power consumption.
One approach for maximizing processor performance involves utilizing a branch prediction circuit to predict whether a conditional branch instruction will be taken. The prediction of whether a conditional branch instruction will be taken can be based on a branch prediction history of previous conditional branch instructions. Instructions corresponding to the predicted branch may then be fetched and speculatively executed by the processor. In the event of a mispredicted branch, the processor may incur a delay while the fetched instructions corresponding to the mispredicted branch are flushed from the execution pipeline, and the instructions corresponding to the taken branch are fetched. Accordingly, an accurate branch predictor is required to minimize the penalties (in terms of both decreased processor performance and unnecessary power consumption) of branch mispredictions.
Accuracy of conventional branch predictors may generally correspond to a number of processor clock cycles required to generate a branch prediction. For example, a relatively simple branch predictor may require only a single processor clock cycle to provide a branch prediction, but the resulting branch prediction may be less accurate. Conversely, a more complex branch predictor may provide a higher degree of accuracy, but may suffer from a multi-cycle latency (i.e., may require multiple processor clock cycles to generate a branch prediction).
To mitigate the tradeoff between accuracy and speed, an “overriding branch predictor” may employ a faster, less accurate first branch predictor in conjunction with a slower, more accurate second branch predictor. Both branch predictors provide predictions for each conditional branch instruction, with the second branch predictor providing its prediction a few processor clock cycles later than the first branch predictor. The processor initially fetches instructions based on the branch prediction of the first branch predictor. When the branch prediction of the second branch predictor is generated, the processor compares it to the first branch prediction. If the predictions differ, the second prediction is used to overwrite the first prediction in the branch prediction history for the first branch predictor, and the proper instructions are re-fetched based on the second branch prediction. Even though the re-fetching of instructions may incur a performance penalty, the processor still achieves a net performance improvement compared to the penalty incurred by waiting until the instructions reach the execution stage before re-fetching. This is particularly the case with processors having a large number of pipeline stages between instruction fetching and instruction execution.
However, because of the multi-cycle latency of the second branch predictor of the overriding branch predictor, the second branch predictor must base its branch predictions on a branch prediction history that is “stale” (i.e., does not contain branch predictions for the most recently encountered conditional branch instructions). As a result, the accuracy and performance of the second branch predictor may be less than optimal.