In computers, branch instructions influence the flow of instruction execution. Many complex instruction set computers (CISC) as well as reduced instruction set computers (RISC) utilize simple decoding and pipelined execution of instructions. A branch instruction in a pipelined computer normally breaks the pipeline until the instruction at the location to which the branch instruction transferred control, the "target instruction," is fetched. As such, these branch instructions impede the normal pipeline flow of instructions. In fact, execution of a branch instruction sometimes consumes about thirty percent of the executed cycle time in CISC and RISC microprocessors.
In the prior art, a number of solutions to this problem have been implemented. One such solution is the use of a branch target buffer which stores the address of the target instruction. The target instructions stored in the branch target buffer are indexed according to the address of the branch instruction. When a branch instruction is being decoded, the branch target buffer is searched for the address of the branch instruction. If the address is found, the target instruction at the target address stored in the branch target buffer is fetched so that upon the execution of the branch instruction, the target instruction is ready for decoding. Later versions of branch target buffers even included the storage of the target instruction. See U.S. Pat. No. 4,725,947. For either type of branch target buffer, the target instruction is executed following the execution of the branch instruction.
In the prior art, a branch folding (BF) technique enables the "folding" of two instructions together. This technique enables parallel execution of two types of instruction: the branch instruction and a non-branch instruction that precedes it. One limitation with this technique is that it cannot be utilized where two branch instructions follow one after the other.
Another limitation is that conditional branch instructions usually rely on the result of the execution of the previous instruction before any determination can be made of whether or not to branch. Once execution of the preceding instruction has been completed, the state of the condition is known, and the branching can occur. Even if the target instruction is available immediately after decoding the conditional branch instruction, the determination of whether to begin decoding, and thereafter executing, the target instruction or to continue in the execution of instructions following the branch instruction in the instruction memory cannot be made until the conditional branch instruction is executed. Since conditional branch instructions must wait for the execution of the instruction which precedes it, branch folding with conditional branch instructions could result in incorrect branches.
In the prior art, a variety of methods have been employed to minimize the effect of conditional branching instructions. One such technique involves inputting a delay instruction immediately following the branch instruction in the pipeline. See U.S. Pat. No. 4,777,587. This allows the branch instruction to be executed and the branching determination to occur while the delay instruction is being decoded. Once execution of the branch instruction is complete, the determination of whether to branch can be known. Therefore, once the delay instruction has completed execution, the instruction processor can input the correct instruction into the decoder depending on whether the condition was satisfied. Even with the pipeline execution being uninterrupted, the delay associated with executing the branch instruction and the added delay of the delay instruction slow the instruction execution.
Other techniques to minimize the effect of conditional branching instructions include predicting the occurrence of branches ahead of time (based on the history of that branch) and correcting for wrong predictions, or fetching multiple instructions until the direction of branch is ascertained, or delaying the effect of branches. However, the overall delay in executing the branch instruction remains.
Accordingly, the present invention provides a device and method which minimizes the delays branch instructions create in pipeline instruction processors and, in some cases, eliminates entirely the delay associated with the execution of a branch instruction, conditional or otherwise.