1. Field of the Invention
The present invention generally relates to processor architectures, and more particularly, the present invention relates to an apparatus for predicting outcomes of branch instructions prior to execution by a processor.
2. Description of the Related Art
Branch instructions include the subset of conditional goto-type commands contained among a processor's instruction set. Typically, multiple instructions are executed by loading selected instructions in an instruction cache or register, and by incrementing a program counter which addresses each of the loaded instructions in order. A branch instruction is a conditional instruction that changes the sequence otherwise dictated by the computer program by specifying a new address at which a new sequence is to begin. An executed branch instruction is said to be "taken" where the associated condition (e.g.,flag "a" is set) is satisfied, resulting in program execution deviating from an instruction contained at a next address to the instruction contained at the address specified by the branch instruction.
Branch prediction refers to a technique in which the outcome of a branch instruction is predicted in advance of actual execution of the instruction. A successful prediction allows for an early loading of the instruction or instructions to be executed immediately after the branch instruction. In fact, in some architectures, the predicted instruction or instructions are speculatively executed in anticipation of the branch result behaving as predicted.
All branch predictions schemes take advantage of the fact that most branches do not behave randomly. Perhaps the simplest technique is so-called "bimodal" branch prediction which distinguishes branches that are typically taken from those that are not. The usual implementation of this approach includes a counter which is incremented when a branch is taken, and decremented when the branch is not taken. Any branch that is repeatedly taken will be predicted as taken, even in the presence of an isolated not-taken event. Likewise, any branch that is repeatedly not taken is predicted as not-taken.
Prediction accuracy may be further improved by a technique known as "local" branch prediction. Local branch prediction schemes attempt to identify repetitive patterns of a branch instruction. Repetitive patterns are particularly a characteristic of loop control branch instructions. These instructions, when taken, direct the program to a previously executed instruction in the program sequence to thus form an instruction loop in which a sequence of instructions is repeated. Eventually the program encounters the same loop control branch instruction, and if again taken, the loop is repeated. A repetitive pattern in the branch instruction results when the loop is repeated the same number of times during each pass. For example, if the loop is repeated four times during each pass, then the loop control branch instruction will exhibit a repetitive pattern of 111011101110 . . . , where 1 is "taken" and 0 is "not taken". Local branch prediction is commonly implemented by way of a history table that stores the history of the branch instruction and a counter that records the current behavior of the branch instruction.
Local branch prediction considers the pattern or pattern of a given branch only. A further refined technique, known as "global branch prediction", considers the behavior of branches other than the current branch for which a prediction is being made. That is, in some cases, the behavior of two or more branches will correlate to some degree. By taking note of the actions of previously executed branches, the behavior of a current branch is predicted.
Global branch prediction is typically implemented as shown in FIG. 1. A shift register 102 records the actions of the most recent h conditional branches. For example, a "1" bit may denote a branch "taken" and a "0" bit may denote a branch "not taken". The resolved branch outcomes of the shift register 102 are actually predicted branch outcomes in the sense that the outcomes are loaded into the shift register as they are predicted. As illustrated by the arrow in FIG. 1, the most significant bit (msb) of the shift register denotes the hth most recent branch behavior, while the least significant bit denotes the most recent branch behavior.
The contents of the bit register 102 are combined with the branch address of the current branch for which a prediction is being executed, with the combined data forming a table address of a branch prediction table 104. The branch history table 104 contains previously generated branch history information. A "predict taken" or a "predict not taken" is output from the branch history table as addressed by the table address obtained from the branch address and the bit register 102 output.
With respect to global branch history based prediction schemes, studies have shown that significant amounts of global branch history are required to obtain low branch misprediction ratios. However, the branch history table size doubles with each additional bit of branch history, and thus, these schemes require large branch prediction tables to achieve high levels of performance. The large size of the tables results in the expenditure of multiple cycles to read the table entries. This in turn requires older branch execution information be used to predict the outcome of a given branch instruction in order to avoid stalling the instruction pipeline in favor of the prediction process. This is explained below with reference to FIG. 2.
Suppose, for example, that the branch history register contains h bits. As mentioned above, these h bits, together with the address of the given branch for which a prediction is desired, form the BPT address used to access the branch prediction table. However, in the time it takes to address the table, multiple branches may have already been executed or resolved. In this case, it is not possible to wait until the branch outcome of the branch immediately preceding the given branch is obtained (and the result applied to the branch history register) before accessing the branch prediction table. This is because the given branch will have executed long before the prediction is completed. In other words, reading of branch prediction table for the given branch will have started while the read for the previous branch or branches has not yet been completed. For this reason, as shown in the bottom half of FIG. 2, older branch information is used. That is, suppose that H branches are resolved in the time it take to address the branch prediction table in connection with a given branch. This means that the address of the given branch must be combined (at time t.sub.ADDR) with the contents of the register well in advance of actual execution of the given branch to obtain a sufficiently early prediction result (at time t.sub.PREDICT). This in turn prevents usage of the most recent H branch outcomes in the prediction process. As branch outcome correlations are often greatest amongst neighboring branches, the prediction performance suffers as a result of the inability to use the most recent branch outcomes in the branch prediction.