This invention relates to the architecture of a digital computer; and more particularly, it relates to the architecture of a digital computer which operates on several different instructions simultaneously in a pipelined fashion.
To better understand this invention, reference should be made to FIG. 1 wherein the basic modules of a pipelined digital computer are illustrated. That FIG. 1 computer includes an instruction prefetch module (labeled IPF), an execute module (labeled EX), and a memory module (labeled M). Further, the execute module includes an address module (labeled A), an operand module (labeled O), and a compute module (labeled C).
Modules IPF, A, O, and C simultaneously operate on different instructions of the program. All of the instructions in that program are stored in memory module M. Module IPF operates to fetch the instructions from memory module M; module A operates to form addresses of operands that are called for in the instructions; module O operates to fetch the operands that are address formed by module A; and module C operates to perform computations on the operands fetched by module O and to store the results back in memory module M.
After any one module performs its above-described function, it passes the results to the next module. That next module then performs its above-described functions and passes the new results to the next module. Thus, modules IPF, A, O, and C form a "pipeline" through which instructions pass; and buses 10, 11, and 12 provide a means by which the modules pass their results through the pipeline to the next module.
Buses 13, 14, 15, and 16 also are provided as a means by which modules IPF, A, O, and C respectively read and/or write various items of information into the memory module while performing their described functions. Module IPF, for example, utilizes bus 13 to fetch instructions from the memory; module A utilizes bus 14 to fetch index registers from the memory that are needed to form operand addresses; module O utilizes bus 15 to fetch operands from the memory at the addresses formed by module A; and module C utilizes bus 16 to store computed results back into the memory.
FIG. 2 shows in detail the sequence by which the FIG. 1 computer executes a program consisting of nine instructions I1 through I9. Those instructions are illustrated in FIG. 1 as being resident in memory M. Instructions I1 through I9 sequentially follow each other in the memory; and instruction I5 is a conditional branch instruction. It tests a condition and branches back to instruction I1 if the condition is true and branches to instruction I6 if that condition is false.
A typical format for a conditional branch instruction consists of an op code (OP) and a branch address (BA) as indicated by reference numeral 17. Op code OP is one pre-assigned combination of ones and zeroes which identifies the instruction as being a conditional branch and identifies the condition to be tested. For example, a binary coded decimal 22 in a Burroughs B4800 computer specifies a branch if equal; whereas a binary coded decimal 25 specifies a branch if not equal. BA is the address in the memory module M of the instruction that is to be executed next if the specified condition is true.
During cycle 1 of FIG. 2, module IPF fetches instruction I1. Thereafter, during cycle 2, module A forms the addresses of the operands needed by instruction I1; while module IPF simultaneously fetches instruction I2. This sequence of operation continues in a pipelined fashion as illustrated in FIG. 2 through cycle 5 at which time the conditional branch instruction I5 is fetched by module IPF.
After fetching the conditional branch instruction I5, module IPF needs to decide whether to fetch instruction I1 or instruction I6 as the next instruction. This would be no problem if the condition to be tested by instruction I5 were ready for testing immediately after that instruction was fetched by module IPF. But that condition can be changed by the preceding instruction I4, so the condition will not be available for testing until instruction I4 has been acted upon by the last module C in the pipeline. That occurs as illustrated in FIG. 2 at the end of cycle 7.
The actual condition itself may be just one bit or the result of a whole sequence of calculations. For example, the IBM 360 computers and IBM 370 computers contain a set of flip-flops called "condition codes", and each possible condition that the computer can test is stored in the condition code's flip-flops. One of the condition codes is "equals" and it is automatically set to a "1" or reset to a "0" right after the computer executes an arithmetic instruction.
In some prior art pipelined computers, module IPF always fetches the instruction at branch address BA following the fetch of a conditional branch instruction. This operation is illustrated in cycle 6 of FIG. 2 wherein instruction I1 is fetched by module IPF. Thereafter, in cycles 7 and 8, instructions I2 and I3 respectively are also fetched by module IPF. Then in cycle 8, module C determines whether or not the condition specified by instruction I5 was such that instructions I1, I2, and I3 should have been fetched by module IPF during cycles 6, 7, and 8.
If instruction I6 should have been fetched instead, that fact is signaled by module C over a control line 18 to modules IPF, A, and O. In response to that signal, module IPF fetches instructions I6, I7, and I8 respectively during cycles 9, 10, and 11; and modules A, O, and C forego any further operations on instructions I1, I2, and I3.
A problem, however, with this sequence of operation is that many cycles are wasted because they perform useless operations. In the above example, module IPF wastes cycles 6 through 8. Typically, a program contains thousands of conditional branch instructions; and so these wasted cycles significantly reduce the computer's throughput.
One way to decrease the number of these wasted cycles is to add a condition predictor flip-flop for each condition that the computer can test. For example, one condition predictor flip-flop could be added for the "equals" condition that is tested by a branch-if-equal instruction; and that flip-flop would indicate a predicted state of true or false for the "equals" condition based on the actual state of that condition over the last several times it was tested. Then when a conditional branch instruction is encountered by module IPF, the next instruction would be fetched based on the predicted state of the condition being tested as stored in the condition predictor flip-flops.
But this mechanism still wastes too many cycles. And why this is so can be seen by inspection of FIG. 3. There a program is illustrated consisting of instructions I10 through I23; and instructions I14 and I16 are conditional branch instructions.
Suppose now that the branch from instruction I14 to instruction I20 is taken very infrequently; whereas the branch from instruction I16 to instruction I10 is taken very frequently. This is a very practical possibility as instruction I13 could be making a comparison for an exception condition that instruction I14 tests, and instruction I15 could be setting or resetting the "equals" condition to indicate whether or not instruction loop I10-I16 should be repeated.
Thus, to minimize wasted cycles in the FIG. 3 program due to conditional branch instruction I16, the predicted "equals" condition should be true. But at the same time, to minimize wasted cycles due to conditional branch instruction I14, the predicted equals condition should be false. Thus a dilemma exists in which one predicted state of a condition minimizes wasted cycles caused by conditional branch instructions at some locations in a program while maximizing wasted cycles caused by conditional branch instructions at other locations in the program.
This dilemma might seemingly be avoided by choosing one state for the condition predictor flip-flops, and then rearranging the program such that the one predicted state always is a correct prediction. Suppose, for example, that the condition predictor flip-flops indicate the "equals" condition will be false and the "not equals" condition will be true. Then suppose further that conditional branch instruction I14 is changed to a "branch-if-not-equal" instruction, instructions I20-I23 are moved to memory locations directly following instruction I14, and instructions I15-I19 are moved to some memory locations remote from I14. With those changes, a relatively fast branch will occur from instruction I14 to instruction I15 at its new location remote from instruction I14; and, as before, a relatively fast branch will occur from instruction I16 to instruction I10.
But this rearrangement of the instructions is very wasteful of memory space for any situation where instructions I20-I23 form a subroutine that is used by many other parts of the program (not shown). In that case, it would be necessary to repeat the I20-I23 code each time it was used as described above in order to increase the execution speed. And such repetition of code for long or often used subroutines would use so much memory as to be impractical. Further, it would be impractical to rearrange pre-existing programs containing thousands and even millions of instructions in the above-described manner in order to increase execution time because such a task would be so immense and time-consuming.