Early microprocessors generally processed instructions one at a time. Each instruction was processed using four sequential stages: instruction fetch, instruction decode, execute, and result writeback. Within such microprocessors, different dedicated logic blocks performed each different processing stage. Each logic block waits until all the previous logic blocks complete operations before beginning its operation.
To improve efficiency, microprocessor designers overlapped the operations of the fetch, decode, execute, and writeback logic stages such that the microprocessor operated on several instructions simultaneously. In operation, the fetch, decode, execute, and writeback logic stages concurrently process different instructions. At each clock tick the results of each processing stage are passed to the following processing stage. Microprocessors that use the technique of overlapping the fetch, decode, execute, and writeback stages are known as "pipelined" microprocessors. Some microprocessors divide each processing stage into substages for further performance improvement. Such processors are referred to as "deeply pipelined" microprocessors.
In order for a pipelined microprocessor to operate efficiently, an Instruction Fetch Unit at the head of the pipeline must continually provide the pipeline with a stream of instructions. However, conditional branch instructions within an instruction stream prevent the Instruction Fetch Unit from fetching subsequent instructions until the branch condition is resolved. In pipelined microprocessor, the branch condition will not be resolved until the branch instruction reaches an instruction execution stage further down the pipeline. The Instruction Fetch Unit must stall since the branch condition is unresolved at the instruction fetch stage and therefore the Instruction Fetch Unit does not know which instructions to fetch next.
To alleviate this problem, many pipelined microprocessors use branch prediction mechanisms that predict the outcome of branch instructions within an instruction stream. The Instruction Fetch Unit uses the branch predictions to fetch subsequent instructions. For example, Yeh & Patt introduced a highly accurate two-level adaptive branch prediction mechanism. (See Tse Yu Yeh and Yale N. Patt, Two-Level Adaptive Branch Prediction, The 24th ACM/IEEE International Symposium and Workshop on Microarchitecture, November 1991, pp. 51-61) The Yeh & Patt branch prediction mechanism makes branch predictions based upon two levels of collected branch history.
When a branch prediction mechanism predicts the outcome a branch instruction and the microprocessor executes subsequent instructions along the predicted path, the microprocessor is said to have "speculatively executed" along the predicted instruction path. During speculative execution the microprocessor must not permanently commit any changes in state since the microprocessor may be executing down the wrong path due to a branch misprediction.
When the branch prediction mechanism mispredicts a branch, an instruction execution unit further down the pipeline eventually detects the branch misprediction. After the instruction execution unit detects a branch misprediction, the instructions that should not have been fetched are flushed out of the microprocessor pipeline and program execution resumes along the corrected instruction path. To properly resume execution along the correct path, the microprocessor must restore any microprocessor state changes that occurred during speculative execution. Furthermore, the microprocessor must obtain the address of the instruction that should have been executed after the branch instruction.
If a branch instruction should have been taken and was mispredicted as not-taken, then the address of the next instruction is the target address of the branch instruction. Thus, after branch that was wrongly predicted not-taken, the microprocessor can resume execution along the correct instruction path by fetching the instruction at the branch instruction's target address. This procedure is relatively simple since the target address is usually specified by the branch instruction and its associated operand.
However, if a branch instruction should have been not-taken and was mispredicted as taken, then the address of the next instruction is the address of the instruction located after the branch instruction. This case is more difficult than the previous case since the address of the next instruction is not specified by the branch instruction or its associated operand. Thus, to resume execution after a branch instruction wrongly predicted as taken, the microprocessor might be required to propagate the address of the instruction located after the branch instruction along the microprocessor pipeline to the instruction execution unit. Once the instruction execution unit executes the branch instruction, the address of the instruction located after the branch instruction will be used if the branch was mispredicted or else the address is dropped. To carry both the branch target address and the address of the instruction located after the branch instruction, the microprocessor pipeline must be constructed very wide and thus use a considerable amount of die area.