The present invention relates generally to the field of processors and in particular to a power efficient method of prefetching processor instructions.
Portable electronic devices provide a wide variety of organizational, computational, communications and entertainment services. These devices continue to increase in both popularity and sophistication. Two relentless trends in portable electronic device evolution are increased functionality and decreased size. Increased functionality demands increased computing power—in particular, ever faster and more powerful processors.
As well as providing advanced features and functionality that require faster processors, portable electronic devices themselves continue to shrink in size and weight. A major impact of this trend is the decreasing size of batteries used to power the processor and other electronics in the device. While increases in battery technology partially offset the problem, the decreasing size of batteries still imposes a strict power budget on all portable electronic device electronics, and in particular on their embedded processors.
Hence, processor improvements that increase performance and/or decrease power consumption are desirable for many applications, such as most portable electronic devices. Most modern processors employ a pipelined architecture, where sequential instructions, each having multiple execution steps, are overlapped in execution. For maximum performance, the instructions should flow continuously through the pipeline. Any situation that causes instructions to be flushed from the pipeline, and subsequently restarted, detrimentally impacts both performance and power consumption.
All real-world programs include conditional branch instructions, the actual branching behavior of which is not known until the instruction is evaluated deep in the pipeline. Most modern processors employ some form of branch prediction, whereby the branching behavior of conditional branch instructions is predicted early in the pipeline, and the processor speculatively fetches (prefetches) and executes instructions, based on the branch prediction. When the actual branch behavior is determined, if the branch was mispredicted, the speculatively fetched instructions must be flushed from the pipeline, and new instructions fetched from the correct next address. Prefeteching instructions in response to an erroneous branch prediction adversely impacts processor performance and power consumption.
Known branch prediction techniques include both static and dynamic predictions. The likely behavior of some branch instructions can be statically predicted by a programmer and/or compiler. One example is an error checking routine. Most code executes properly, and errors are rare. Hence, the branch instruction implementing a “branch on error” function will evaluate “not taken” a very high percentage of the time. Such an instruction may include a static branch prediction bit in the opcode, set by a programmer or compiler with knowledge of the most likely outcome of the branch condition. Other branch instructions may be statically predicted based on their run-time attributes. For example, branches with a negative displacement (i.e., those that branch “backwards” in code), such as loop exit evaluations, are usually taken, while branches with a positive displacement (that branch “forward” in code) are rarely taken. Hence, the former may be statically predicted “taken,” and the latter, “not taken.”
Dynamic prediction is generally based on the branch evaluation history (and in some cases the branch prediction accuracy history) of the branch instruction being predicted and/or other branch instructions in the same code. Extensive analysis of actual code indicates that recent past branch evaluation patterns may be a good indicator of the evaluation of future branch instructions. As one example of a simple branch-history branch predictor, a plurality of one-bit flags may be maintained, each indexed by address bits of a conditional branch instruction. Each flag is set when the branch evaluates “taken,” and reset when it evaluates “not taken.” The branch prediction may then simply be the value of the associated flag. For some branch instructions, this predictor may yield accurate predictions.
A design goal closely related to maximizing branch prediction accuracy is minimizing the adverse impact of erroneous branch predictions. Consider the “branch on error” condition described above, with a one-bit flag as a dynamic branch predictor. Normally, the branch is not taken, and the associated flag remains a zero, predicting “not taken” for future executions of the instruction. When an error does occur, the branch is mispredicted and the wrong instructions are prefetched into the pipeline. The processor recovers from the erroneous branch prediction (sacrificing performance and wasting power), according to known branch misprediction recovery methods, and the flag is set to reflect the “taken” branch. However, the next execution of the branch instruction will still most likely be “not taken.” In this case, the single-bit branch evaluation history causes two mispredictions for each anomalous branch evaluation—one for the anomaly and another for the next subsequent execution of the branch instruction.
One known technique for minimizing the deleterious effect of a mispredicted branch evaluation is to introduce the concept of a strong or weak prediction—that is, a prediction (i.e., taken or not taken) weighted by a confidence factor (e.g., strongly or weakly predicted). A simple example of this is a bimodal branch predictor comprising a table of two-bit saturating counters, indexed by memory access instruction addresses. Each counter assumes one of four states, each assigned a weighted prediction value, such as:
11Strongly predicted taken10Weakly predicted taken01Weakly predicted not taken00Strongly predicted not taken
The counter increments each time a corresponding branch instruction evaluates “taken” and decrements each time the instruction evaluates “not taken.” This incrementing/decrementing is “saturating,” as incrementing stops at Ob11, and decrementing stops at Ob00. Thus, the branch prediction includes not only an outcome (taken or not) but also a weighting factor indicative of the strength or confidence of the prediction.
A branch instruction such as the “branch on error” considered above will only mispredict once with a saturation counter, rather than twice as with a single-bit flag predictor. The first branch prediction will move the predictor from “strongly not taken” to “weakly not taken.” The actual prediction is bimodal, and is represented by the MSB. Hence, the next occurrence of the branch instruction will still be predicted “not taken,” which is likely correct.
A bimodal saturation counter may be of arbitrary size. For example, a three-bit counter may be assigned prediction confidence strengths as follows:
111Very strongly predicted taken110Strongly predicted taken101Predicted taken100Moderately predicted taken011Moderately predicted not taken010Predicted not taken001Strongly predicted not taken000Very strongly predicted not taken
Of course, the labels are terms of reference only; the binary value of the counter determines the strength of the branch prediction confidence, with greater confidence at either end of the range, and lower confidence towards the middle of the range.
Saturation counters may track prediction accuracy as well as branch evaluation, as known in the art. The output of a saturation counter may be a weighted value of “agree” or “disagree,” and the output combined with a static prediction to arrive at a weighted prediction. In general, a broad array of branch prediction methods is known in the art, including those wherein a predictor is used not to predict the branch at all, but to select a prediction from between two or more other, independent predictors. See, for example, Scott McFarling's 1993 paper, “Combining Branch Predictors,” Digital Western Research Laboratory Technical Note TN-36, incorporated herein by reference in its entirety.
While the introduction of a measure of confidence in a prediction improves branch prediction accuracy by tracking actual branch behavior over time, the actual prediction is bimodal, represented by the MSB. In the prior art, the branch is either predicted “taken” or “not taken, and prefetching proceeds from a predicted next address, which is either a branch target address or the next sequential address to the branch instruction. That is, the weighting of the prediction, or its strength, is not considered.