Processing systems may employ instructions which cause a change in control flow, such as conditional branch instructions. The direction of a conditional branch instruction is based on how a condition evaluates, but the evaluation may only be known deep down an instruction pipeline of a processor. To avoid stalling the pipeline until the evaluation is known, the processor may employ branch prediction mechanisms to predict the direction of the conditional branch instruction early in the pipeline. Based on the prediction, the processor can speculatively fetch and execute instructions from a predicted address in one of two paths—a “taken” path which starts at the branch target address, with a corresponding direction referred to as the “taken direction”; or a “not-taken” path which starts at the next sequential address after the conditional branch instruction, with a corresponding direction referred to as the “not-taken direction”.
When the condition is evaluated and the actual branch direction is determined, if the branch was mispredicted, (i.e., execution followed a wrong path) the speculatively fetched instructions may be flushed from the pipeline, and new instructions in a correct path may be fetched from the correct next address. Accordingly, improving accuracy of branch prediction for conditional branch instructions mitigates penalties associated with mispredictions and execution of wrong path instructions, and correspondingly improves performance and energy utilization of a processing system.
Conventional branch prediction mechanisms may include one or more state machines which may be trained with a history of evaluation of past and current branch instructions. For example, a bimodal branch predictor uses two bits per branch instruction (which may be indexed using a program counter (PC) of the branch instruction, and also using functions of the branch history as well as a global history involving other branch instruction histories) to represent four prediction states: strongly taken, weakly taken, weakly not-taken, and strongly not-taken, for the branch instruction. While such branch prediction mechanisms are relatively inexpensive and involve a smaller footprint (in terms of area, power consumption, latency, etc.), their prediction accuracies are also seen to be low.
More complex branch prediction mechanisms are emerging in the art for improving prediction accuracies. Among these, complex branch prediction mechanisms, so called neural branch predictors (e.g., Perceptron, Fast Path branch predictors, Piecewise Linear branch predictors, etc.) utilize bias weights and weight vectors derived from individual branch histories and/or global branch histories in making branch predictions. However, these complex branch prediction mechanisms may also incur added costs in terms of area, power, and latency. The energy and resources expended in training the neural branch predictors for obtaining the bias weights, weight vectors, etc., as well as in utilizing the complex branch prediction mechanisms are seen to be particularly wasteful when mispredictions occur, albeit at a lower rate than the mispredictions which may result from the use of the simpler branch prediction mechanisms such as the bimodal branch predictor.
Furthermore, it is also observed that the benefits of neural branch predictors, e.g., measured in terms of branch prediction accuracy, are not uniform for all branch instructions. Rather, a subset of branch instructions (e.g., globally dependent branch instructions, branch instructions used in state-based workloads) are seen to gain the most significant benefits from neural branch prediction whereas the remaining branch instructions are observed to not have a significant improvement in their prediction accuracy. Furthermore, this subset of branch instructions which benefit from the neural branch predictors is also observed to cover a very small number of the overall set of branch instructions in a given application or workload.
However, conventional approaches which utilize neural branch predictors do not take into account the disproportionate benefit of the neural branch predictors across the set of branch instructions for which predictions are obtained. In other words, the neural branch predictors are used in obtaining branch predictions for all branch instructions without regard to potential benefits of utilizing such expensive mechanisms in each individual case. This leads to over-utilization of neural branch predictors and associated area, power, and latency costs in approaches wherein neural branch predictors are employed.
On the other hand, some approaches may avoid neural branch predictors altogether due to their high costs in terms of area, power, latency in conventional implementations wherein all branch instructions are predicted using the neural branch predictors. Thus, the benefits of neural branch predictors are lost in these cases for all branch instructions.
Thus, there is observed to be a potential opportunity for improving the deployment of neural branch predictors in a manner which avoids wasteful utilization while also improving the benefits of neural branch predictors in suitable cases.