The present invention relates to the field of digital computers and, in particular, to apparatus for processing instructions in high speed data processing systems.
A pipelined computer system divides computational tasks into a number of sequential subtasks. In such a pipelined computer system, each instruction is processed in part at each of a succession of hardware stages.
After the instruction has been processed at each of the stages, the execution is complete. In a pipelined configuration , as an instruction is passed from one stage to the next, that instruction is replaced by the next instruction in the program. Thus, the stages together form a "pipeline" which, at any given time, executes, in part, a succession of instructions. A pipelined computer system thus provides concurrent processing of a succession of instructions. Such instruction pipelines for processing a plurality of instructions in parallel are found in various computers.
When a pipelined system encounters a branch instruction, it is wasteful of computer resources to wait for execution of the instruction before proceeding with the next instruction fetch and execute. Therefore, pipelined systems commonly utilize branch prediction mechanisms to predict the outcome of branch instructions before the execution of the instruction, and such branch prediction mechanisms are used to guide prefetching of instructions.
Accordingly, it is a known advantage to provide a mechanism to predict a change in program flow as a result of a branch instruction. It is also known, however, that there is a time penalty for an incorrect prediction of program flow. This time loss occurs when instructions issue along the incorrect path selected by the branch prediction mechanism.
Therefore, an object of the invention is to provide an improved branch prediction apparatus with a high rate of correct predictions, so as to minimize the time loss resulting from incorrect predictions.
In the prior art, the reduction of branch penalty is attempted through the use of a branch cache interacting with the instruction prefetch stage. The branch cache utilizes the address of the instruction being prefetched to access a table. If a branch was previously taken at a given address, the table so indicates, and in addition, provides the target address of the branch on its previous execution. This target address is used to redirect instruction prefetching, based on the likelihood that the branch will repeat its past behavior. This approach offers the potential for eliminating delays associated with branches. Branch cache memory structures are utilized to permit predictions of non-sequential program flow following a branch instruction, prior to a determination that the instruction is capable of modifying program flow.
A system utilizing a branch cache does not require computation of the branch address before instruction prefetching can continue. Instead, the branch cache is used to make predictions based solely on previous instruction locations, thereby avoiding the wait for decoding of the current instruction before proceeding with prefetch of the next instruction. The branch address need not be calculated before prefetching can proceed, because target or branch addresses are stored in the branch cache. There are, however, delays due to incorrect prediction of branches.
Moreover, in a computer system which utilizes complex commands or "macro-instructions" requiring an interpretive instruction set or "micro-instruction" set, such as microcode, different types of macro-instructions will branch to different locations in microcode, and macro-instructions of the same type but at different addresses will branch to the same entry point into microcode. A micro-instruction branch's behavior depends on its address, and on the address of the macro-instruction that invoked the microcode routine.
Additionally, in some pipelined computer systems, the microcode processor itself is pipelined to improve performance, and both macro-instructions and micro-instructions are fetched by the same prefetch hardware. It is an object of the invention to provide a branch cache system adapted for a computer which utilizes both macro-instructions and micro-instructions, in which the microcode engine is pipelined and in which both macro-instructions and micro-instructions are fetched by the same hardware.
It is another object of the invention to provide a branch cache system which improves the ability of the branch cache to correctly predict the results of micro-instruction branches.