The rate at which a computer or other processing system can process information is often dependent on the speed at which the system processor(s) execute instructions. Therefore, increased processing may advantageously be obtained by improving the speed at which processor process instructions. Many processors, such as a microprocessor found in a computer, use an instruction pipeline to speed the processing of instructions. FIG. 1 illustrates a known architecture for such an instruction pipeline. The first stage of the pipeline includes a branch prediction unit 100 and a next Instruction Pointer (IP) logic unit 110 that select an instruction to be executed. An instruction cache 120 is accessed in the second stage of the pipeline, and the instruction moves into the third stage. The instruction moves from a third stage unit 130 to a fourth stage unit 140 and so on, before reaching a branch execution unit 150 in the execution stage. The “intermediate stages” shown in FIG. 1 imply that any number of stages can exist in a pipeline. The stages may, for example, generate instructions for an instruction decoder.
Consider, for example, the following sequence of instructions:                address X1:                    XXX1            JCC-Y1            XXX2            XXX3            XXX4            XXX5                        address Y1:                    YYY1            YYY2            YYY3In this case, address X1 stores a first instruction (“XXX1”) followed by a “conditional” jump or branch instruction (“JCC-Y1”). The branch is conditional in that the next instruction to be performed may be either the next sequential instruction (“XXX2”) or an instruction at a new address (“Y1”). The processor does not know which branch, or “path,” will be taken until JCC-Y1 is executed, i.e., reaches the branch execution unit 150.                        
Assume now that the branch prediction unit 100 and the next IP logic unit 110 have selected instruction XXX1 to be executed. The processor could wait for XXX1 to move through each stage in the pipeline before processing the next instruction, or JCC-Y1. In this case, the branch execution unit 150 would remain idle while JCC-Y1 moves through the pipeline. To improve the processor's performance, JCC-Y1 is placed into the first stage as soon XXX1 moves into the second stage. As a result, JCC-Y1 will be ready for execution as soon as the branch execution unit 150 is finished with XXX1.
When JCC-Y1 moves into the second stage, however, the processor will not know if XXX2 or YYY1 should be placed into the first stage, because this information is only available after JCC-Y1 has been executed by the branch execution unit 150. Therefore, the branch prediction unit 100 “predicts” which branch of the program will be needed. By way of example, Table I shows the movement of the above instruction sequence through the pipeline shown in FIG. 1. As can be seen at time 6, the branch prediction unit 100 has predicted that instruction YYY1 will follow JCC-Y1. Note that several instruction “clock” cycles may or may not pass between the time JCC-Y1 moves into the second stage and the time YYY1 is placed into the first stage.
TABLE IProgram FlowFirstSecondThirdFourthInt.ExecutionTimeStageStageStageStageStagesStage 1XXX1. . . 2JCC-Y1XXX1. . . 3JCC-Y1XXX1. . . 4JCC-Y1XXX1. . . 5JCC-Y1. . . 6YYY1. . . 7YYY2YYY1. . .. . .. . .. . .. . .. . .. . .. . .10YYY5YYY4YYY3YYY2. . .XXX111YYY6YYY5YYY4YYY3. . .JCC-Y112XXX2. . .13XXX3XXX2. . .14XXX4XXX3XXX2. . .15XXX5XXX4XXX3XXX2. . .
When JCC-Y1 is actually executed at time 11, the branch prediction unit 100 has “mispredicted” and, in fact, XXX2 must be processed next. In this case, instructions YYY1 through YYY6, currently in the pipeline, are discarded and the branch execution unit 150 waits for XXX2 to travel through each pipeline stage before it can be executed. This delay, or mispredicted branch “recovery” time, slows the operation of the processor. Moreover, as the number of stages in a pipeline increases, the delay caused by each mispredicted path may also increase.