A basic pipeline microarchitecture of a microprocessor processes one instruction at a time. The basic dataflow for an instruction is: instruction fetch, decode, cache access, execute, and result write back. Each stage within the pipeline must occur in order and hence a given stage can not progress unless the stage in front of it is progressing. In order to achieve highest performance, one instruction will enter the pipeline every cycle. Whenever the pipeline has to be delayed or cleared, this adds latency which in turns can be monitored by the time it takes a microprocessor to carry out a given task.
There are many dependencies between instructions which prevent the optimal case of a new instruction entering the pipe every cycle. These dependencies add latency to the pipe. One category of latency contribution deals with branches. A branch is an instruction which can either fall though to the next sequential instruction, not taken, or branch off to another instruction address, taken, and carry out execution of a different series of code. At decode time, the branch is detected, and must wait to be resolved in order to know the proper direction the instruction stream is to proceed. By waiting for potentially multiple pipeline stages for the branch to resolve the direction to proceed, adds latency into the pipeline. To overcome the latency of waiting for the branch to resolve, the direction of the branch can be predicted such that the pipe begins decoding either down the taken or not taken path. At branch resolution time, the guessed direction is compared to the actual direction the branch was to take. If the actual direction and the guessed direction are the same, then the latency of waiting for the branch to resolve has been removed from the pipeline in this scenario. If the actual and predicted direction are not equivalent, then decoding proceeded down the improper path and all instructions in this path behind that of the improperly guessed direction of the branch must be flushed out of the pipe, and the pipe must be restarted at the correct instruction address to begin decoding the actual path of the given branch. Because of the controls involved with flushing the pipe and beginning over, there is a penalty associated with the improper guess and latency is added into the pipe over simply waiting for the branch to resolve before decoding further. By having a high rate of correctly guessed paths, the ability to remove latency from the pipe by guessing the correct direction out weighs the latency added to the pipe for guessing the direction incorrectly.
In order to improve the accuracy of the guess associated with the guess of a branch, a Branch History Table (BHT) can be implemented which allows for direction guessing of a branch based on the past behavior of the direction the branch previously went. If the branch is always taken, as is the case of a subroutine return, then the branch will always be guessed taken. IF/THEN/ELSE structures become more complex in their behavior. A branch may be always taken, sometimes taken and not taken, or always not taken. Based on the implementation of dynamic branch prediction, it will determine how the BHT predicts the direction of the branch.
There are many concepts that can be applied to create algorithms for dynamically predicting the direction that a branch is to take. Some of the most common to date include keeping a 2 bit saturated counter for a range of entries. Each counter has 4 states which is typically encoded as 00: strongly not take, 01: weakly not taken, 10: weakly taken, 11: strongly taken. When a branch is either strongly not taken or weakly not taken and the branch is resolved not taken, then the state becomes strongly not taken. The inverse also applies where if the branch is weakly or strongly guessed taken and the branch is resolved as taken, then the state becomes strongly taken. If the guessed direction is different from that of the direction the branch is resolved, then the state moves in the inverse direction. In general if the branch is guessed strongly not taken, then the state becomes weakly not taken. If the branch is guessed weakly not taken, the state becomes weakly taken. If the guess is strongly taken, the state becomes weakly taken and if the guess was weakly taken, the state becomes weakly not taken. In general, a table of these counters is created and the way the table is indexed can have many schemes and profound differences on prediction accuracy. For branches which close off loops, the prediction will be correct (X−1)/X amount of the time where X is the times the loop is processed. An addressing scheme of the branch instruction address into the table works very well in such a situation. In the cases of IF/THEN/ELSE structures where the direction has a higher level of conditional based information, a scheme of using instruction addresses as an entry point into the table does not work with the same level of accuracy. In such circumstances, determining where the branch occurs and XORing the pattern of the direction of the last ‘N’ branches provides a higher level of accuracy. Other schemes include having multiple branch prediction tables for sets of branches based on the type of branch, for example Branch and Link, Branch on Count, or Branch on Condition.
In order to take advantage of the increased accuracy multiple prediction tables can provide, there needs to be a method for selecting which of the tables will be used for predicting the direction of a given branch. The standard method of selection involves having one additional array table of equal size to the number of entries that are stored in the branch history tables and pattern history tables. The number of additional selection arrays goes up as additional direction prediction based arrays are added.
There have been many methods to improve branch prediction accuracy which include those patents discussed below; however, while they deal with prediction accuracy, they don't additionally account for area, power, and context switches. U.S. Pat. No. 5,935,241—“Multiple Global Pattern History Tables for Branch Prediction in a Microprocessor” targets selecting between multiple history predictors based on selection of microprocessor state including user privilege (supervisor state versus user state). U.S. Pat. No. 6,272,623—“Methods and Apparatus for Branch Prediction using Hybrid History with Index Sharing” targets using global and local history to select a branch prediction from a single prediction table.