Advanced processors use pipelining techniques to execute instructions at very high speeds. A pipeline is like an assembly line. In an automobile assembly line, there are many steps, each contributing to the construction of the car. Each step operates in parallel with the other steps, though on a different car. In a processor pipeline, each step completes a part of an instruction. Like the assembly line, different steps are completing different parts of different instructions in parallel. Each of these steps is called a pipe stage. The stages are connected one to the next to form a pipe where instructions enter at one end, progress through the stages, and exit at the other end. A pipeline is most effective if it can process a steady stream of instructions in a sequential manner.
When a branch is executed, it may change the instruction pointer (IP) to something other than its current value plus a predetermined fixed increment. If a branch changes the IP to the address of the branch target (given by the branch instruction), it is a “taken” branch. If it falls through, it is “not taken”. Knowledge of whether the branch will be taken or not, and the address of the branch target, typically becomes available when the instruction has reached the last or next to last stage of the pipe. This means that all instructions that issued later than the branch- and hence not as far along in the pipe as the branch—are invalid, i.e. they should not be executed, if the branch is taken, because the next instruction to be executed following the branch is the one at the target address. All of the time spent by the pipeline on the later issued instructions is wasted delay, thus significantly reducing the speed improvement that can be obtained from the pipeline. To alleviate the delay that may be caused by the branch, there are two steps that can be taken. First, find out whether the branch will be taken or not taken (the “direction” of the branch) earlier in the pipeline. Second, compute the target address earlier.
One method for dealing with branches is to use hardware inside the processor to predict whether an address will result in a branch instruction being taken or not taken. Examples of such hardware include the 2-bit saturating counter predictor (see “Computer Architecture A Quantitative Approach”, David A. Patterson and John L. Hennessy, 2d Edition, Morgan Kauffman Publishers, pp. 262–271,) and the local history predictor which uses the past behavior (taken/not-taken) of a particular branch instruction to predict future behavior of the instruction. The use of a combination of two different predictors has been proposed to obtain more accurate predictions, where in U.S. Pat. No. 5,758,142 the final prediction at the output of a multiplexer is selected between a prediction provided using a branch past history table and one provided using a global branch history table, where the selection is made according to the most significant bit of a counter. Another technique uses the combination of the local history predictor and the saturating counter predictor to achieve more accurate predictions than either one can by itself, by using the branch history (obtained from a matching entry in a local history table) to index into a pattern history table, where the next execution of a branch is finally predicted by the value of a 2-bit saturating counter predictor. See article by T. Yeh and Y. N. Patt, “Alternative Implementations of Two-Level Adaptive Branch Prediction”, Proc. 19th Symposium on Computer Architecture (May 1992) Gold Coast, Australia 124–134. Implementation of both of these techniques, however, requires a relatively large area on the processor chip.