Branch prediction mechanisms can be loosely divided into static branch prediction and dynamic branch prediction mechanisms.
Static branch prediction is implemented by including a prediction within the branch instruction, i.e. a bit that gives an indication to the processor executing the conditional branch instruction whether a conditional branch is likely to be taken or not. This bit is set by the compiler based on either heuristics, i.e. a conditional branch out of loops is most often not taken, or based on feedback from program execution. The feedback from execution is collected by means of having a program inserting instructions around each conditional branch which records whether the branch is taken or not. The program is then executed and statistics are collected. Thereupon, the program is compiled once again and the collected branch statistics is used to set branch prediction.
Dynamic branch prediction collects branch statistics in separate data structures in the processor, for example in branch history tables, BHTs, or in separate bits in the processor instruction cache or memory. Usually one or two bits in an instruction cache line are used.
The disadvantage with these methods are:
Setting static branch prediction based on heuristics does not give optimal performance.
Setting static branch prediction based on feedback gives a number of extra steps in the program generation and works well only as long as the branch statistics collected are similar to real execution in systems using varying and different data sets.
Dynamic branch prediction adds cost for additional data structures within the CPU. Due to physical limitations, as well as costs, these structures can not include data for all conditional branch instructions in the program and several data branches have to share entries within a BHT. The performance of dynamic branch prediction then depends on the statistical behaviour of the program. For example, if the lower bits of the address of the conditional branch instruction are used to select an entry in the BHT, the performance can depend on whether or not the program has been loaded on addresses that make more than one often executed branch.
In telecommunication applications, programs are loaded into the system and will be used continuously for a long time, i.e. usually at least for weeks, until the system is reloaded with a new revision of the program. The execution can in most cases be expected to have the same statistics during that time.
Furthermore, U.S. Pat. No. 5,367,703 describes a branch prediction mechanism in a superscalar processor system. The mechanism uses branch history tables which include a separate branch history for each fetch position within a multi-instruction access. A prediction field consisting of two bits is used for determining whether a particular branch is to be taken or not. The value of the two bits is incremented or decremented in response to a branch being taken or not.
U.S. Pat. No. 5,423,011 discloses an apparatus consisting of an associated memory in which branch prediction bits are stored, cache lines and comparison means for matching stored prediction bits with their corresponding cache lines.
In the patent application GB 2 283 595 a branch prediction circuitry which can operate in one of the two user selectable modes is described.