Processors may fetch instructions from various sources of instructions that may hold instructions. A source of instructions may be a traditional cache, a trace cache, an instruction buffer, or even just system memory. One recent form of a cache is the trace cache. Rather than storing macro-instructions as is done in other caches, the trace cache contains sequences of previously-decoded micro-operations (micro-ops) of macro-instructions. The sequence of micro-ops may be stored in a sequence of set and way locations in the trace cache called a trace, where the micro-ops at a given set and way location may be called a traceline or trace element. Then, in further cases of executing the particular macro-instruction, decoding is not necessary and the sequence of micro-ops may be accessed from the corresponding trace in the trace cache.
Along with the use of caches, processors may further enhance processing throughput by using branch predictors. Often which direction is to be taken subsequent to a branch instruction is not known until the instruction reaches the execution stage at the very end of the pipeline. Not knowing the direction following the branch instruction would require that the pipeline be stalled until after the branch instruction executes, which would severely impact performance. For this reason, processor designers may use one or more branch predictors that predict, during the early stages of a pipeline, which direction the branch is likely to take. The pipeline may then be kept full, and the predicted direction may be compared with the actual direction at execution time. Only if a misprediction occurs does the pipeline need to be flushed and the instructions re-executed.
Several varieties of branch predictors may be used, and in some cases several may be used and a mechanism may be used to select a prediction from among several proffered. A bimodal predictor may make a prediction based upon recent history of a particular branch's execution, and give a prediction of usually taken or usually not-taken. A global predictor may make a prediction based upon recent history of all the branches' execution, not just the particular branch of interest. In some cases the global predictor may hash together recent history (taken or not taken) along with a portion of the address (linear instruction pointer) involved for the branches to form what may be called a “stew”. Using the current stew values for prediction may give good results with branches that are dependent on the direction of previous branches.
Neither the bimodal nor the global predictor perform well with branches used in loops. Note that a loop may be coded so that the loop direction may be either the taken or the not-taken direction of the branch. Therefore the present disclosure uses the terminology “loop direction” and “not loop direction” as indicating alternately the branch taken or not-taken directions depending upon the coding of the loop. The bimodal predictor may simply predict loop direction and mispredict the end of the loop (fall-through, e.g. not loop direction). The global predictor, using a stew value, may also mispredict the end of the loop. With a long enough loop, the hashing used to form the stew may end up giving a constant or constantly-repeating value for the stew. Using more stew bits may extend the usefulness of the global predictor but at a substantial cost in terms of circuit complexity and also in the time required to initially train the predictor.