1. Field of the Invention
This invention relates in general to computers, and in particular to interlacing the paths after a conditional branch like instruction.
2. Prior Art
In a conventional computer, after a failed conditional branch, execution proceeds with the sequential instructions following the conditional branch instruction (the condition false path). After a successful conditional branch instruction, execution resumes with instructions fetched beginning at the branch target address (the condition true path).
The above is simple and direct for non-pipelined execution. However, there are delays in calculating (and/or obtaining) the conditional branch target address and in accessing memory for the instructions at the target address. Some techniques that have been used to minimize conditional branch delays are described next.
Branch target buffers have been used to minimize conditional branch delays. However, there is a delay associated with calculating the branch target address, then associating on the address and accessing the target instructions from the branch target buffer.
When the branch condition can be tested early, a delayed conditional branch can be performed such that the instruction fetch unit can get an early start on fetching the instructions to be executed next.
A "prepare to branch" instruction can provide the branch target address prior to the actual conditional branch instruction. This technique reduces code density.
The preceding approaches attempt to overcome the inherent difficulty of simultaneously fetching both program paths from two separate areas of memory. The present invention interlaces both program paths after a conditional branch like test and so avoids the difficulty.
Branch prediction has been used such that only the most likely to be executed path is fetched.
Early drum computers could switch drum tracks as the way of implementing conditional branches with minimum delay. However, the allocation of instructions to the drum memory tracks is difficult to do in an optimal fashion.
I believe the 360/65 micro-programming facility accessed two micro-instructions in parallel each clock. This allowed using a just computed test result to select one of the two micro-instructions just prior to executing the micro-instruction. This minimized the delay between the test result and proceeding with the computation. I believe that each micro-instruction had a next address field. For a general purpose computer instruction set, this technique would have a low code density.
Both of the above provide near immediate access to either of the two paths following a conditional branch like test.
For additional information, see the recent survey article, "Reducing the Branch Penalty in Pipelined Processors" by David Lilja in "Computer", July, 1988.
The object of the present invention is to minimize the delays due to conditional branch like tests.