Early microprocessors generally processed instructions one at a time. Each instruction was processed using four sequential stages: instruction fetch, instruction decode, execute, and result writeback. Within such microprocessors, different dedicated logic blocks performed each different processing stage. Each logic block waited until all the previous logic blocks complete operations before beginning its operation.
To improve efficiency, microprocessor designers overlapped the operations of the fetch, decode, execute, and writeback logic stages such that the microprocessor operated on several instructions simultaneously. In operation, the fetch, decode, execute, and writeback logic stages concurrently process different instructions. At each clock tick the result of each processing stage is passed to the following processing stage. Microprocessors that use the technique of overlapping the fetch, decode, execute, and writeback stages are known as "pipelined" microprocessors. Some microprocessors further divide each processing stage into substages for additional performance improvement. Such processors are referred to as "deeply pipelined" microprocessors.
In order for a pipelined microprocessor to operate efficiently, an instruction fetch unit at the head of the pipeline must continually provide the pipeline with a stream of microprocessor instructions. However, conditional branch instructions within an instruction stream prevent the instruction fetch unit from fetching subsequent instructions until the branch condition is fully resolved. In pipelined microprocessor, the branch condition will not be fully resolved until the branch instruction reaches an instruction execution stage near the end of the microprocessor pipeline. Accordingly, the instruction fetch unit will stall because the unresolved branch condition prevents the instruction fetch unit from knowing which instructions to fetch next.
To alleviate this problem, many pipelined microprocessors use branch prediction mechanisms that predict the existence and the outcome of branch instructions within an instruction stream. The instruction fetch unit uses the branch predictions to fetch subsequent instructions. For example, Yeh & Patt introduced a highly accurate two-level adaptive branch prediction mechanism. (See Tse Yu Yeh and Yale N. Patt, Two-Level Adaptive Branch Prediction, The 24th ACM/IEEE International Symposium and Workshop on Microarchitecture, November 1991, pp. 51-61) The Yeh & Patt branch prediction mechanism makes branch predictions based upon two levels of collected branch history.
When a branch prediction mechanism predicts the outcome of a branch instruction and the microprocessor executes subsequent instructions along the predicted path, the microprocessor is said to have "speculatively executed" along the predicted instruction path. During speculative execution the microprocessor is performing useful processing if the branch instruction was predicted correctly.
However, if the branch prediction mechanism mispredicted the branch instruction, the microprocessor is executing instructions down the wrong path and therefore accomplishes nothing. When the microprocessor eventually detects the mispredicted branch, the microprocessor must flush the instructions that were speculatively fetched from the instruction pipeline and restart execution at the correct address.
Since a microprocessor accomplishes nothing when a branch instruction mispredicted, it is desirable to accurately predict branch instructions. Furthermore, it is desirable to correct mispredicted branches as soon as possible such that the microprocessor can restart execution at the correct address and resume useful processing as soon as possible. This is especially true for deeply pipelined microprocessors wherein a long instruction pipeline will be flushed each time a branch misprediction is made.
One type of branch instruction common to most computer processors is a "Return From Subroutine" branch instruction. The Return From Subroutine instruction instructs the microprocessor to pop a return address off the top of a Last-In-First-Out (LIFO) stack and begin executing instructions at that address. In most microprocessors, the LIFO stack is stored in a main memory coupled to the microprocessor. The LIFO stack is often maintained using a microprocessor register as a stack pointer. Thus, the Return From Subroutine instruction is an unconditional branch instruction that requires an access to main memory to execute. In the current generation of high-speed microprocessors, instructions that access main memory are slow relative to other instructions. It is therefore desirable to be able to predict the return address of Return From Subroutine branch instructions such that the processor does not need to stall while the main memory access occurs.