Pipelining is a well-known technique whereby several instructions are overlapped in execution. Today, most microprocessors rely upon pipelining for improved, high-speed performance. A major effect of pipelining, however, is that it introduces data and control hazards, which can cause significant performance losses. For example, the ideal speedup from pipelining can be reduced by half due to pipeline stalls and other delays caused by branch penalties.
Branch instructions can be either unconditional, meaning that the branch is taken every time that the instruction is encountered in the program, or conditional, meaning that the branch is either taken or not taken, depending upon a condition. Most often, the instructions to be executed following a conditional branch are not known with certainty until the condition upon which the branch depends has been resolved. These types of branches can significantly reduce the performance of a pipeline processor since they may interrupt the steady supply of instructions to the execution hardware. Branch predictors attempt to predict the outcome of conditional branch instructions in a program before the branch instruction is executed. If a branch is mispredicted, all of the speculative work, beyond the point in the program where the branch is encountered, must be discarded. Therefore, a highly-accurate branch prediction mechanism is vital to a high-performance, pipelined processor.
The prior art is replete with different branch prediction schemes. A general overview of the problems associated with branch prediction, and the presentation of a number of solutions is provided in an article by J. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design", IEEE Computer (January 1984). An article authored by James E. Smith, entitled "A Study of Branch Prediction Strategies", IEEE (1981) discusses a variety of branch prediction techniques in terms of accuracy, costs and flexibility of use. A typical method of branch prediction utilizes a memory to store branch history information associated with the branch instruction. An example of this approach to branch prediction is found in U.S. Pat. No. 5,142,634.
Many early implementations of branch predictors used simple history bits and counter-based schemes that provide branch prediction accuracy of about 85-90%. Attempts to improve upon the accuracy of simple 2-bit counter schemes have included predictors that relate the sub-history information of a branch to the most recently executed branches via a shift register. An example of this approach is disclosed in the article entitled "Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation", by Shien-Tai Pan, et al.
As the complexity of the branch prediction problem increases, so has the sophistication of branch predictors. By way of example, the article "Branch Classification: A New Mechanism for Branch Predictor Performance", by Po-Young Chang, et al., Proceedings from Micro-27 (December 1994) describes a hybrid predictor in which each component branch predictor predicts only those branches for which it is best suited. Other sophisticated approaches employ complicated branch prediction algorithms that try to predict whether or not a branch will be taken based upon a lot of history information. This category of branch predictions is exemplified by mechanisms disclosed in several papers by Tse-Yu Yeh and Yale N. Patt entitled, "A Comparison of Dynamic Branch Predictors that Use Two Levels of Branch History" IEEE (1993); "Two-Level Adaptive Training Branch Prediction"; and "Alternative Implementations of Two-Level Adaptive Branch Prediction" ACM (1992).
One of the problems with sophisticated branch predictors is the large amount of the silicon space required for implementing the branch prediction hardware. This has presented microprocessor designers with a dilemma: either utilize a simple branch predictor (with limited accuracy) that occupies a small amount of area, or employ a sophisticated branch predictor (with higher accuracy) that takes up a relatively large amount of silicon space.
Thus, there exists an unsatisfied need for a way to optimize branch prediction.