This invention relates to processor pipelines, and more particularly to mitigating instruction prediction latency with independently filtered presence prediction at the time of instruction fetching.
Instruction prediction, such as branch prediction, is a performance-critical component of a pipelined high frequency microprocessor and is used to predict the direction (taken vs. not taken) and the target address of branch instructions. Branch prediction is beneficial because it allows processing to continue along a branch's predicted path rather than having to wait for the outcome of the branch to be determined. An additional penalty is incurred only if a branch is mis-predicted.
A branch target buffer (BTB) is a structure that stores branch and target information. Other structures such as a branch history table (BHT) and pattern history table (PHT) can be included to store information used for branch direction prediction.
The BTB can be searched independently from instruction fetching to find upcoming branches, in which case it is called lookahead, or asynchronous, branch prediction. Lookahead branch prediction can be implemented in such a way that branch prediction is usually ahead of instruction fetching and decode. In such a configuration, branch predictions steer instruction fetching. This is an effective instruction pre-fetch mechanism particularly if the BTB instruction footprint is bigger than that of the first level instruction cache. There are times however when the BTB search falls behind, which most frequently happens after restart conditions when there is a race between the BTB trying to predict the first upcoming branch instruction and the instruction fetch logic trying to fetch and deliver the new instruction stream. It is also possible for the BTB to fall behind if its throughput cannot keep up with the number of branches in the instruction stream. In such cases when the BTB falls behind and is not able to provide branch prediction information for branch instructions, such branches are predicted using a less accurate predictor. If such branches are guessed as taken, instruction fetching is restarted once the target address of the branch is computed.
One current solution to reduce or mitigate branch prediction delay is the use of hierarchical predictors. Hierarchical approaches include multi-level caching, overriding, and cascading predictors. All of these approaches involve combining small-and-fast predictors with large-and-slow predictors. In a lookahead predictor as described above, two level caching can help reduce prediction latency, but typically does not eliminate cases where the BTB falls behind. An overriding predictor combines a small and fast first level predictor that can be overridden by a larger and slower predictor. Typically, an overriding predictor can add complexity and may not eliminate latency problems in a lookahead design. An overriding predictor could reduce latency problems if the first level BTB predictor were smaller than it otherwise would be, but doing so would also decrease the pre-fetching benefit and prediction accuracy provided from the first level BTB. Cascading accesses different latency predictors in parallel and uses the most accurate predictor available in time for the branch. As with the previous two approaches, this approach may not solve the latency problem in a lookahead predictor. Implementing a cascading structure in a lookahead predictor is not straightforward because the prediction needs to be used immediately to redirect the BTB search and deciding whether to redirect the prediction search with the quickest prediction or to wait for the slowest prediction is difficult.