The present invention relates to out-of-order instruction execution. In particular, the present invention relates to branch instructions that are executed out-of-order and to resolving mispredicted branches.
A commonly used technique for increasing pipelined microprocessor performance has been to execute instructions in an order other than a sequential order, i.e. out-of-order. Typically, pipelined processors supporting out-of order instruction execution include instruction scheduling units that decide which instructions to execute and in which order such instructions are executed.
When branch instructions are included, instruction fetch units typically include branch prediction units. Branch instructions typically include two addresses, a target address (TA) and a fall-through address (FA). A processor is instructed to jump to the TA for the next instruction to be executed when a branch is taken, and the processor is instructed to jump to the FA, which is typically the next sequential address in the program order, for the next instruction to be executed when the branch is not taken. Typically, branch prediction units predict whether branches are taken or not taken and/or predict a TA for the branches.
Problems arise with out-of-order execution schemes when branch instructions are included. For example, typically only upon completion of execution and pending retirement of a branch instruction is the processor able to determine a direction, whether a branch was taken or not taken. Only after the actual directions are determined or actual TAs are determined can the processor determine whether the branch predictions were correct. If the branch prediction was incorrect, i.e. a branch was taken that should not have been taken, the predicted TA is not the same as the actual TA, etc, then the pipeline will typically be stopped, the instructions issued after the branch instruction flushed from the pipeline, and the processor restarted according to the actual branch result. Because a typical processor must wait until the branch instruction is actually executed and awaiting retirement, before the instruction at the new TA can be fetched, many processing cycles are wasted.
Other drawbacks with typical pipelined processors that support branch predictions are that large amounts of data must be stored in order to allow for the processor to restart properly. Typically, all instructions including branch instructions being processed in the pipeline are stored in a "central instruction window" in a single memory. However, branch instructions require the storage of different data than conventional instructions and thus require the central instruction window to be increased in word size (bits) to store the branch instruction specific data. Further, with the additional storage of branch instructions, execution units require additional parsing logic to decode the additional fields and data. Thus, storage of branch instruction related data requires a great increase in memory size and logic.
The above problem is greatly magnified as the depth of the central instruction window increases, i.e. the number of instructions in an instruction pipeline increases. Another drawback is that typical pipeline processors have great difficulty resolving more than one branch instruction per clock cycle.
What is needed are improved methods and apparatus for resolving multiple branches that are executed out-of-order and resolving mis-predicted branches.