1. Field of the Invention
This invention relates to processor pipelines, branch prediction and branch prediction latency, and particularly to a system and method and for reducing branch prediction latency using a branch target buffer with most recently used column prediction.
2. Description of Background
Branch prediction is a performance-critical component of a pipelined high frequency microprocessor and is used to predict the direction (taken vs. not taken) and the target address of branch instructions, which is beneficial because it allows processing to continue along a branch's predicted path rather than having to wait for the outcome of the branch to be determined. An additional penalty is incurred only if a branch is mis-predicted.
A Branch Target Buffer (BTB) is a structure that stores branch and target information. Other structures such as a Branch History Table (BHT) and Pattern History Table (PHT) can be included to store information used for branch direction prediction.
The BTB can be searched in parallel to and independently from instruction fetching to find upcoming branches, in which case it is called lookahead branch prediction. Alternatively, the BTB can be accessed simultaneously with or after fetching instructions and determining instruction boundaries in order to provide a prediction for each encountered branch instruction. In either case the performance benefit of the BTB is a function of the accuracy of the prediction provided from the BTB and the latency required to access the BTB. A large BTB can often provide better prediction accuracy than a small one because it can store information about more branch instructions, however it has a longer latency than a smaller BTB.
Current solutions to reduce or mitigate branch prediction delay include hierarchical predictors, which are implemented solely in hardware, and cooperative predictors, which rely on hardware support for compiler optimizations based on profiling. Regardless of whether or not structures such as hierarchical predictors and cooperative predictors, are employed, techniques to minimize the latency of a set associative BTB are needed.