The present invention relates to the field of digital computers and, in particular, to apparatus for processing instructions in high speed data processing systems.
A pipelined computer system divides computational tasks into a number of sequential subtasks. In such a pipelined computer system, each instruction is processed in part at each of a succession of hardware stages.
After the instruction has been processed at each of the stages, the execution is complete. In a pipelined configuration , as an instruction is passed from one stage to the next, that instruction is replaced by the next instruction in the program. Thus, the stages together form a "pipeline" which, at any given time, executes, in part, a succession of instructions. A pipelined computer system thus provides concurrent processing of a succession of instructions. Such instruction pipelines for processing a plurality of instructions in parallel are found in various computers.
When a pipelined system encounters a branch instruction, it is wasteful of computer resources to wait for execution of the instruction before proceeding with the next instruction fetch and execute. Therefore, Pipelined systems commonly utilize branch prediction mechanisms to predict the outcome of branch instructions before the execution of the instruction, and such branch prediction mechanisms are used to guide prefetching of instructions.
Accordingly, it is a known advantage to provide a mechanism to predict a change in program flow as a result of a branch instruction. It is also known, however, that there is a time penalty for an incorrect prediction of program flow. This time loss occurs when instructions issue along the incorrect path selected by the branch prediction mechanism.
Therefore, an object of the invention is to provide an improved branch prediction apparatus with a high rate of correct predictions, so as to minimize the time loss resulting from incorrect predictions.
In the prior art, the reduction of branch penalty is attempted through the use of a branch cache interacting with the instruction prefetch stage. The branch cache utilizes the address of the instruction being prefetched to access a table. If a branch was previously taken at a given address, the table so indicates, and in addition, provides the target address of the branch on its previous execution. This target address is used to redirect instruction prefetching, based on the likelihood that the branch will repeat its past behavior. This approach offers the potential for eliminating delays associated with branches. Branch cache memory structures are utilized to permit predictions of non-sequential program flow following a branch instruction, prior to a determination that the instruction is capable of modifying program flow.
A system utilizing a branch cache does not require computation of the branch address before instruction prefetching can continue. Instead, the branch cache is used to make predictions based solely on previous instruction locations, thereby avoiding the wait for decoding of the current instruction before proceeding with prefetch of the next instruction. The branch address need not be calculated before prefetching can proceed, because target or branch addresses are stored in the branch cache.
There are, however, delays due to incorrect prediction of branches. Moreover, in a computer system which utilizes complex commands instructions requiring an interpretive instruction set, such as microcode, fetches parcels of fixed length and processes instructions of different lengths, the number of correct branch predictions provided by a prior art branch cache is reduced. This reduction results from branch instructions which terminate in the middle of prefetched parcels. Prior art branch cache systems can process only a single prediction per parcel, and, in pipelined computer systems, a parcel can contain two or more branch instructions.
Accordingly, it is an object of the invention to provide a branch cache system which can generate predictions of program flow changes at any point within parcel boundaries. This is especially important in systems where instruction granularity differs from parcel granularity.
It is another object of the invention to provide a multiset branch cache system having set selection elements for selecting a branch prediction from among at least two branch predictions per parcel.