The present invention relates to an instruction pipeline in a processor. More particularly, the present invention relates to a mispredicted path side memory for an instruction pipeline.
The rate at which a computer or other processing system can process information is often dependent on the speed at which the system processor(s) execute instructions. Therefore, a increased processing may advantageously be obtained by improving the speed at which processor process instructions. Many processors, such as a microprocessor found in a computer, use an instruction pipeline to speed the processing of instructions. FIG. 1 illustrates a known architecture for such an instruction pipeline. The first stage of the pipeline includes a branch prediction unit 100 and a next Instruction Pointer (IP) logic unit 110 that select an instruction to be executed. An instruction cache 120 is accessed in the second stage of the pipeline, and the instruction moves into the third stage. The instruction moves from a third stage unit 130 to a fourth stage unit 140, and so on, before reaching a branch execution unit 150 in the execution stage. The xe2x80x9cintermediate stagesxe2x80x9d shown in FIG. 1 imply that any number of stages can exist in a pipeline. The stages may, for example, generate instructions for an instruction decoder.
Consider, for example, the following sequence of instructions:
In this case, address X1 stores a first instruction (xe2x80x9cXXX1xe2x80x9d) followed by a xe2x80x9cconditionalxe2x80x9d jump or branch instruction (xe2x80x9cJCC-Y1xe2x80x9d). The branch is conditional in that the next instruction to be performed may be either the next sequential instruction (xe2x80x9cXXX2xe2x80x9d) or an instruction at a new address (xe2x80x9cY1xe2x80x9d). The processor does not know which branch, or xe2x80x9cpath,xe2x80x9d will be taken until JCC-Y1 is executed, i.e., reaches the branch execution unit 150.
Assume now that the branch prediction unit 100 and the next IP logic unit 110 have selected instruction XXX1 to be executed. The processor could wait for XXX1 to move through each stage in the pipeline before processing the next instruction, or JCC-Y1. In this case, the branch execution unit 150 would remain idle while JCC-Y1 moves through the pipeline. To improve the processor""s performance, JCC-Y1 is placed into the first stage as soon XXX1 moves into the second stage. As a result, JCC-Y1 will be ready for execution as soon as the branch execution unit 150 is finished with XXX1.
When JCC-Y1 moves into the second stage, however, the processor will not know if XXX2 or YYY1 should be placed into the first stage, because this information is only available after JCC-Y1 has been executed by the branch execution unit 150. Therefore, the branch prediction unit 100 xe2x80x9cpredictsxe2x80x9d which branch of the program will be needed. By way of example, Table I shows the movement of the above instruction sequence through the pipeline shown in FIG. 1. As can be seen at time 6, the branch prediction unit 100 has predicted that instruction YYY1 will follow JCC-Y1. Note that several instruction xe2x80x9cclockxe2x80x9d cycles may or may not pass the time JCC-Y1 moves into the second stage and the time YYY1 is placed into the first stage.
When JCC-Y1 is actually executed at time 11, the branch prediction unit 100 has xe2x80x9cmispredictedxe2x80x9d and, in fact, XXX2 must be processed next. In this case, instructions YYY1 through YYY6, currently in the pipeline, are discarded and the branch execution unit 150 waits for XXX2 to travel through each pipeline stage before it can be executed. This delay, or mispredicted branch xe2x80x9crecoveryxe2x80x9d time, slows the operation of the processor. Moreover, as the number of stages in a pipeline increases, the delay caused by each mispredicted path may also increase.