In a computer architecture, a branch predictor is a component or a portion of a processor that determines whether a conditional branch in the instruction flow of a program is likely to be taken or not taken. This is called branch prediction. Branch predictors are important for today's modern, superscalar processors for achieving high performance. Such arrangements facilitate processors to fetch and execute instructions without waiting for a branch to be resolved. Most pipelined processors perform some type of branch prediction as they can guess the address of the next instruction to fetch before the current instruction has been executed.
Branch predictors may be local or global, and can be separate devices and/or part of processors and/or cores. Local branch predictors generally maintain two tables of two-bit entries. For example, the first table is the local branch history table. Such table is indexed by the low-order bits of each branch instruction's address, and it can record the taken/not-taken history of the n-most recent executions of the branch. The other table can be the pattern history table. This table contains bimodal counters, and its index may be generated from the branch history in the first table. To predict a branch, the branch history is looked up, and that history is then used to look up a bimodal counter which makes a prediction.
Global branch predictors make use of the fact that the behavior of many branches is strongly correlated with the history of other recently taken branches. For example, a single shift register can be updated with the recent history of every branch executed, and this value may be used to index into a table of bimodal counters. Generally, global branch prediction may be less accurate than the local prediction.
Conventional branch predictors may consist of multiple distinct types of predictors. In particular, this can be some combination of local and global predictors. However, under a conventional architecture, each distinct predictor generally makes a prediction for every branch, and then the aggregate predictor selects from among the various predictions.
In the expected later-developed distributed architectures, it may be that a variable number of processors can collaborate to accelerate single programs. In that case, one problem that may need to be addressed is how the predictions are made to keep many instructions in flight among all of the participating processors. These participating processors may, at some time, collude to accelerate one program, and, at other times, execute separate, distinct programs. In the latter mode, it may be important for each of the processors to have their own predictor for the independent jobs they are executing.
One possible solution to the above described problem that has been the subject of the current research is to designate one of the participating processors to be the “master processor”, which is responsible for making all of the predictions. In such case, all of the other participating processors' branch predictors would be unused. This case could create two unappealing solutions. In one case, e.g., the predictor is made large enough to drive the predictions for the large configuration in which many processors are participating, and many instructions are in flight. In that case, the predictor is too large (and therefore potentially slow) for when the processors are running in “independent” mode, with their own respective software tasks. In the other case, the predictor is tuned for independent mode, and is therefore smaller, but in that case it is undersized for “collaborative” mode.