Early microprocessors generally processed instructions one at a time. Each instruction was processed using four sequential stages: instruction fetch, instruction decode, execute, and result writeback. Within such microprocessors, different dedicated logic blocks performed each different processing stage. Each logic block waited until all the previous logic blocks complete operations before beginning its operation.
To improve efficiency, microprocessor designers overlapped the operations of the fetch, decode, execute, and writeback processing stages such that the microprocessor operated on several instructions simultaneously. In operation, the fetch, decode, execute, and writeback processing stages concurrently process different instructions. At each clock tick, the result of each processing stage is passed to the following processing stage. Microprocessors that use the technique of overlapping the fetch, decode, execute, and writeback stages are known as "pipelined" microprocessors. Some microprocessors further divide each processing stage into substages for additional performance improvement. Such processors are referred to as "deeply pipelined" microprocessors.
In order for a pipelined microprocessor to operate efficiently, an instruction fetch unit at the head of the processing pipeline must continually provide the pipeline with a stream of microprocessor instructions. However, conditional branch instructions within an instruction stream prevent the instruction fetch unit from fetching subsequent instructions until the branch condition is fully resolved. In pipelined microprocessor, the branch condition will not be fully resolved until the branch instruction reaches an instruction execution stage near the end of the microprocessor pipeline. Accordingly, the instruction fetch unit of the microprocessor will stall because the unresolved branch condition prevents the instruction fetch unit from knowing which microprocessor instructions to fetch next.
To alleviate this problem, most pipelined microprocessors implement a branch prediction mechanism that predicts the existence and the outcome of branch instructions within an instruction stream. The instruction fetch unit uses the branch predictions to determine which instructions should be fetched after a branch instruction. For example, Yeh & Patt introduced a highly accurate two-level adaptive branch prediction mechanism. (See Tse Yu Yeh and Yale N. Patt, Two-Level Adaptive Branch Prediction, The 24th ACM/IEEE International Symposium and Workshop on Microarchitecture, November 1991, pp. 51-61)
To predict the outcome of branch instructions, most branch prediction mechanisms collect branch "histories" that store the outcome of recent occurrences of each branch. For example, a branch prediction mechanism may store the outcome (taken or not-taken) of the last k occurrences of a particular branch. The branch history is stored in a branch target buffer along with a tag address for identifying the location of the branch instruction and a target address for identifying the branch target destination.
When a branch prediction mechanism predicts the outcome of a branch instruction and the microprocessor executes subsequent instructions along the predicted path, the microprocessor is said to have "speculatively executed" along the predicted instruction path. During speculative execution the microprocessor is performing useful processing only if the branch instruction was predicted correctly. However, if the branch prediction mechanism mispredicted the branch instruction, then the microprocessor is speculatively executing instructions down the wrong path and therefore accomplishes nothing useful. When the microprocessor eventually detects that the branch instruction was mispredicted, the microprocessor must flush all the speculatively executed instructions and restart execution at the correct address.
Since the microprocessor accomplishes nothing when a branch instruction is mispredicted, it is very desirable to accurately predict branch instructions. This is especially true for deeply pipelined microprocessors wherein a long instruction pipeline will be flushed each time a branch misprediction is made. This presents a large misprediction penalty.
Studies of branch behavior have indicated that program loops are not predicted well by history pattern based predictors. It would therefore be desirable to have an improved branch prediction mechanism that would predict the behavior of branch instructions in program loops.