In the field of microprocessors and other programmable logic devices, many improvements have been made in recent years which have resulted in significant performance improvements. One such improvement is the implementation of pipelined architectures, in which multiple microprocessor instructions are processed simultaneously along various stages of execution, so that the processing of subsequent instructions (in program order) begins prior to the completion of earlier instructions. Because of pipelining, the effective rate at which instructions are executed by a microprocessor can approach one instruction per machine cycle in a single pipeline microprocessor, even though the processing of each individual instruction may require multiple machine cycles from fetch through execution. So-called superscalar architectures effectively have multiple pipelines operating in parallel, providing even higher theoretical performance levels.
Of course, as is well known in the art, branching instructions are commonplace in most conventional computer and microprocessor programs. Branching instructions are instructions that alter the program flow, such that the next instruction to be executed after the branching instruction is not necessarily the next instruction in program order. Branching instructions may be unconditional, such as JUMP instructions, subroutine calls, and subroutine returns. Some branching instructions are conditional, as the branch depends upon the results of a previous logical or arithmetic instruction.
Conditional branching instructions present complexity in microprocessors of pipelined architecture, because the condition upon which the branch depends is not known until execution, which may be several cycles after fetch. In these situations, the microprocessor must either cease fetching instructions after the branch until the condition is resolved, introducing a "bubble" of empty stages (i.e., potential instruction processing slots) into the pipeline, or must instead speculatively fetch an instruction (in effect guessing the condition) in order to keep the pipeline full, at a risk of having to "flush" the pipeline of its current instructions if the speculation is determined to be incorrect.
The benefit of speculative execution of instructions in keeping the pipeline full, particularly in architectures with long or multiple pipelines, typically outweighs the performance degradation of pipeline flushes, so long as the success rate of the speculative execution is reasonable. Many modern microprocessors therefore follow some type of branch prediction techniques by way of which the behavior of branching instructions may be predicted with some accuracy. One type of branch prediction is referred to as "static" prediction, as the prediction does not change over time or history. A simple static prediction approach merely predicts all branches to be "taken". An improved static branch prediction approach predicts according to branch direction, for example by predicting all branches in the forward direction to be "not taken" and predicting all backward branches (e.g., LOOP instructions in DO loops) to be "taken". Of course, unconditional branches may always be statically predicted as "taken".
Dynamic branch prediction refers to a known technique of branch prediction that uses the results of past branches to predict the result of the next branch. A simple well-known dynamic prediction technique merely uses the results of the most recent one or two conditional branching instructions to predict the direction of a current branching instruction.
A more accurate dynamic branch prediction approach predicts the direction of a branching instruction by its own branching history, as opposed to the branch results of other instructions. This approach is generally incorporated into modern microprocessors by way of a branch target buffer. A conventional branch target buffer, or BTB, is a cache-like table of entries that each store an identifier (a "tag") for recently encountered branching instructions, a branch history-related code upon which prediction is made, and a target address of the next instruction to be fetched if the branch is predicted as taken (the next sequential address being the address to be fetched for a "not taken" prediction). When a branching instruction is fetched, its address is matched against the tags in the BTB to determine if this instruction has been previously encountered; if so, the next instruction is fetched according to the prediction code indicated in the BTB for that instruction. Newly-encountered branching instructions are statically predicted, as no history is present in the BTB. Upon execution and completion of the instruction, the BTB entry is created or modified to reflect the actual result of the branching instruction, for use in the next occurrence of the instruction.
Various conventional alternative actual prediction algorithms that predict branches based upon the most recently executed branches or upon the branching history of the same instruction, are known in the art. A well-known simple prediction algorithm follows a four-state state machine model, and uses the two most recent branch events to predict whether the next occurrence will be taken or not taken. The four states are referred to as "strongly taken", "weakly taken", "weakly not taken", and "strongly not taken". A "strongly" state corresponds to the last two branches (either generally or for the particular instruction, depending upon the implementation) having been taken or not taken, as the case may be. A "weakly" state corresponds to the last two branches having differing results, with the next branch result either changing the prediction to the other result, or maintaining the prediction but in a "strongly" state.
A recent advance in branch prediction algorithms uses not only branch history results, but also branch pattern information, in generating a prediction of branch behavior. For example, a certain branch instruction may be a loop of three passes, such that its branch history will repetitively follow a pattern of taken-taken-not taken. Use of a simple two-bit, or four-state, prediction mechanism will not correctly predict the branching of this instruction, even though its behavior is entirely predictable. The well-known two-level adaptive branch prediction mechanism, described in Yeh & Patt, "Two-Level Adaptive Branch Prediction", The 24th ACM/IEEE International Symposium and Workshop on Microarchitecture, (November 1991), pp. 51-61, uses both branch history and branch pattern information to predict the results of a branching instruction. Branch prediction using the Yeh & Patt approach has been applied to microprocessor architectures using BTBs, as described in U.K. Patent Application 2 285 526, published Jul. 12, 1995.
A known limitation of branch prediction in deeply pipelined architectures arises from tight loops of program code, for example where the same branching instruction is encountered more than once within a range corresponding to the number of instructions in the pipeline. In this case, the actual branch history (and branch pattern information) that is modified upon execution and completion of the instruction is somewhat out of date, and may result in incorrect prediction of an otherwise predictable branch, because the updating of the results may not be "synchronized" with the most recent prediction of the branch result. This problem is described in International Publication No. WO 94/27210, published Nov. 24, 1994. As described in this publication, a known approach to avoid the problem of small program loops is to include "speculative branch history" information in the BTB entry for each branching instruction; the speculative branch history may be used to predict a branch if branches have been predicted for that instruction but have not yet been resolved. As described in WO 94/27210, separate fields may be provided in each BTB entry for the actual and speculative branch history information, with the actual branch history copied into the speculative branch history field upon the first speculative prediction. The use of speculative branch history in this manner permits the accurate prediction of branching patterns, even if occurring within small program loops.
According to this approach, however, a redundant copy of the actual branch history is kept in each BTB entry. For purposes of efficiency and chip area, it would be desirable to have only a single copy of the actual branch history in each BTB entry, rather than requiring additional storage area for the redundant copy. However, according to conventional techniques, a microprocessor having a single combined actual and speculative branch history field cannot recover from a misprediction if more than one speculative branch prediction is stored in this combined field. Especially as pipelines become deeper in modern microprocessors, many programs will have multiple instances of conditional branches within the pipeline at a given time, requiring the storage and use of multiple speculative branch predictions in predicting the next instance of the instruction.