Modern microprocessor pipelines use branch prediction to mitigate performance loss due to control dependency stalls. Many solutions where the branch predictor learns entirely in hardware or in a combination with software have been proposed and used in several generations of microprocessors. All of these schemes attempt to predict a branch outcome based on a history context that is derived from prior computation state that contains correlated information to the branch outcome. Global history based branch predictors such as gshare and bgG use a context derived from global branch history which has been proven to contain correlated information for correctly predicting branches in many cases. However, in certain cases such as when the branch outcome depends on input data with no predictable patterns, the global history may not have any correlated information, and such branches are hard to predict using predictors such as gshare.
However, a compiler might know the outcome of such hard-to-predict data-dependent branches and be able to convey that information to the hardware. One of the most attractive solutions are branch hint instructions that can be placed ahead of a branch in the code stream and provide a hint about which way the branch is going. These may be static hints or they may be dynamic, with static hints being simple, but outperformed by a modern branch prediction. Dynamic branch hints are more complex to implement, and while they have been proposed many times, there is resistance to them for many reasons, such as the significant additional complexity to the microarchitecture is an example.
FIG. 1A is pseudocode representing a typical program loop with a conditional branch. In this example the loop is executed an unpredictable number of times, which means that existing branch predictors such as gshare or loop prediction do not work perfectly. Typically they will mispredict the non-taken branch (e.g., instruction 101) at the end of the loop and will speculate that the loop will continue to run. FIG. 1B illustrates a global branch history recorded for code as shown in FIG. 1A, when predicting loop exit while running the loop over and over with random loop counts. Note that the distance between 0's in the global history (prior loop exits) will be a random distribution when the loop count is random. Therefore, the global history does not contain information that helps in correctly predicting loop exit.
Mispredicting the loop leads to wasted fetch, decode and execution of many wrong path instructions, which must then be thrown away when the branch is resolved. Mispredictions and subsequent wrong path execution reduce performance by consuming cycles that could have been used to execute correct instructions; the machine is also wasting joules performing speculative computations that then get thrown away. The performance and energy cost of mispredictions is exacerbated in a longer pipeline.
In an attempt to reduce mispredictions, a standard dynamic branch hint instruction would be deployed, in which new instruction 102 and label 103 are added as shown in FIG. 1C. The mythical instruction 102 checks the flags with a “greater than” test. If the test passes, it tells the branch predictor that the instruction at the address given by its argument 103 will branch the same way. This instruction has no architectural affect as it does not change program state; it simply updates or hints the branch prediction hardware. Note the trick of using a “greater than” test at the top of the loop but a “greater or equal” at the bottom, and also of reusing the flags set by instruction DEC at the bottom of the loop to feed the hint at the top. Alternatively instruction DEC could be moved to the top of the loop and both tests changed to use a “greater or equal” test, but frequently the body of the loop will want to use the loop counter as an input.
Ideally, the branch hint instruction needs to be put a number of clocks ahead of the branch itself in the pipeline, so that it is executed and its condition resolved well before the hard-to-predict branch for which the hint is targeted has entered the start of the pipeline. Thus the optimal distance does depend on the details of the microarchitecture and the pipeline length. The obvious problem with adding branch hint instructions is that they are a new set of instructions, and thus have compatibility concerns both forwards and backwards. Furthermore, after instruction 102 of FIG. 1C is computed, this value has to be communicated to the predictor before predicting the hard-predict-branch for which the hint is targeted. This means that somehow there has to be a dynamic matching and communication of computed value from the branch hint instruction to the dynamic branch for which the hint is targeted prior to its prediction time. Potential implementations of this communication mechanism in microarchitecture are very complex and may not be reliable.