A basic pipeline microarchitecture of a microprocessor processes one instruction at a time. The basic dataflow for an instruction follows the steps of: instruction fetch, decode, address computation, data read, execute, and write back. Each stage within a pipeline (also referred to hereinafter as a pipe) occurs in order; and hence a given stage can not progress unless the stage in front of it is progressing. In order to achieve highest performance for the given base, one instruction will enter the pipeline every cycle. Whenever the pipeline has to be delayed or cleared, this adds latency which in turn can be monitored by the performance of a microprocessor as it carries out a task. While there are many complexities that can be created for performance gains, this sets the groundwork for branch prediction theory.
There are many dependencies between instructions which prevent the optimal case of a new instruction entering the pipe every cycle. These dependencies add latency to the pipe. One category of latency contribution deals with branches. When a branch is decoded, is can either be taken or not taken. A branch is an instruction which can either fall through to the next sequential instruction that is not taken, or branches off to another instruction address, that is taken and carries out execution of a different sequential series of codes. At decode time, the branch is detected, and must wait to be resolved in order to know the proper direction in which the instruction stream is to proceed. By waiting for potentially multiple pipeline stages for the branch to resolve the direction in which to proceed, latency is added into the pipeline. To overcome the latency of waiting for the branch to resolve, the direction of the branch can be predicted such that the pipe begins decoding either down the path taken or the path not taken. At branch resolution time, the guessed direction is compared to the actual direction the branch was to take. If the actual direction and the guessed direction are the same, then the latency of waiting for the branch to resolve has been removed from the pipeline. If the actual and predicted directions miscompare, then decoding has proceeded down the improper path and all instructions in this path, those behind that of the improperly guessed direction of the branch, must be flushed out of the pipe and the pipe must be restarted at the correct instruction address to begin decoding the actual path of the given branch. Because of controls involved with flushing the pipe and beginning over, there is a penalty associated with the improper guess and latency is added into the pipe over simply waiting for the branch to resolve before decoding further. By having a proportionally higher rate of correctly guessed paths, the ability to remove latency from the pipe by guessing the correct direction outweighs the latency added to the pipe for guessing the direction incorrectly.
In order to improve the accuracy of the guesses associated with the guess of a branch, a Branch History Table (BHT) can be implemented which allows for guessing the direction of a branch based on the past behavior of the direction in which the branch went previously. If the branch is always taken, as is the case of a subroutine return, then the branch will always be guessed as taken. IF/THEN/ELSE structures become more complex in their behavior. A branch may be always taken, sometimes taken and sometimes not taken, or always not taken. Based on the implementation of a dynamic branch predictor, this will determine how well the BHT, or some other mechanism, predicts the direction of the branch.
A BHT is generally good at predicting dominantly taken or not taken branches. Its basis for prediction is based on the location of a given branch and the past majority of directional occurrence for the given branch. Other schemes of branch prediction are based on paths leading up to the given branch. By basing the prediction value on the path that was taken to get to the given branch, the directionally guessed path is no longer based on the general occurrence for a given branch, but rather a path of taken and not taken branches. Such paths can be global paths where the path of the last X branches is used to determine the guess of the current branch. Likewise, for higher cost of the area required for the branch direction predictors, prediction schemes have been developed where the last X branches are tracked for sets of branches. Going to the extreme, histories can be acquired such that the direction of the given branch is tracked based on the different paths of taken and not taken branches that led to its given occurrence. The directionally based schemes are pattern based and their histories can be said to be stored in a Pattern History Table (PHT). A BHT is good for predicting direction of branches which are dominantly taken or not taken and a PHT has the strong point of predicting non-dominant branches. Because of these individual strengths, hybrid schemes have been developed where for every entry in the BHT, there is another array of equal size which keeps track of the BHT accuracy over the last few occurrences compared to that of the PHT. Every time the BHT is correct and the PHT is incorrect, the hybrid selector moves a counter towards the BHT. When the inverse occurs, the counter moves towards the PHT. When both are correct, or both are incorrect, the counter is stationary. Such a scheme combines the strengths of the individual predictors to create an even better predictor. It turns out that a very high percentage of the time, both predictors are predicting in the same direction. Because most of the times the predictors are predicting in the same direction, there is much overhead in creating such a hybrid scheme in respect to the performance advantages that are gained.
Single branch prediction schemes have existed in many formats and they have been combined. The combined predictors are in general referred to as hybrid predictors and may consist of two or more predictors. In general, these predictors are highly accurate; however, their accuracy improvements are small compared to the growth in area required for them. Thus, a need exists to provide a way to generate hybrid predictors with high area savings.
A further need exists for a hybrid predictor where the majority of the overhead of such a hybrid predictor is removed while the advantages of a PHT based scheme are maintained in the majority. There is a further need for a simple path to pull in a third hybrid predictor while keeping the overall cost and complexity of such a scheme low and realistic to design in hardware.