1. Field of the Invention
This invention relates to processor pipelines, branch prediction and branch prediction latency, and particularly to a system and method for mitigating lookahead branch prediction latency with branch presence prediction at the time of instruction fetching.
2. Description of Background
Branch prediction is a performance-critical component of a pipelined high frequency microprocessor and is used to predict the direction (taken vs. not taken) and the target address of branch instructions. Branch prediction is beneficial because it allows processing to continue along a branch's predicted path rather than having to wait for the outcome of the branch to be determined. An additional penalty is incurred only if a branch is mis-predicted.
A Branch Target Buffer (BTB) is a structure that stores branch and target information. Other structures such as a Branch History Table (BHT) and Pattern History Table (PHT) can be included to store information used for branch direction prediction.
The BTB can be searched in parallel to and independently from instruction fetching to find upcoming branches, in which case it is called lookahead branch prediction. Lookahead branch prediction can be implemented in such a way that branch prediction is usually ahead of instruction fetching and decode. In such a configuration, branch predictions steer instruction fetching. It is an effective instruction pre-fetch mechanism particularly if the BTB footprint is bigger than that of the first level instruction cache. There are times however when the BTB search falls behind, which most frequently happens after restart conditions when there is a race between the BTB trying to predict the first upcoming branch instruction and the instruction fetch logic trying to fetch and deliver the new instruction stream. It is also possible for the BTB to fall behind if its throughput cannot keep up with the number of branches in the instruction stream. In such cases when the BTB falls behind and is not able to provide branch prediction information for branch instructions, such branches are predicted using a less accurate predictor. If such branches are guessed taken, instruction fetching is restarted once the target address of the branch is computed.
Current solutions to reduce or mitigate branch prediction delay include hierarchical predictors, which are implemented solely in hardware, and cooperative predictors, which rely on hardware support for compiler optimizations based on profiling. Hierarchical predictors include two level caching, overriding predictors, and cascading. All of these approaches involve combining small-and-fast predictors with large-and-slow predictors. In a lookahead predictor as described above, two level caching can help reduce prediction latency, but typically does not eliminate cases where the BTB falls behind. An overriding predictor combines a small and fast first level predictor that can be overridden by a larger and slower predictor. Typically, an overriding predictor can add complexity and may not eliminate latency problems in a lookahead design. An overriding predictor could reduce latency problems if the first level BTB predictor were smaller than it otherwise would be, but doing so would also decrease the pre-fetching benefit and prediction accuracy provided from the BTB. Cascading accesses different latency predictors in parallel and uses the most accurate predictor available in time for the branch. As with the previous two approaches, this approach may not solve the latency problem in a lookahead predictor. Implementing a cascading structure in a lookahead predictor is not straightforward because the prediction needs to be used immediately to redirect the BTB search and deciding whether to redirect the prediction search with the quickest prediction or to wait for the slowest prediction is difficult.