1. Field of the Invention
The present invention relates to the field of branch prediction, and in particular, to systems and methods for accessing prediction information related to branch instructions.
2. Background Art
Advanced processors employ pipelining techniques to execute instructions at very high speeds. On such processors, the overall machine is organized as a pipeline consisting of several cascaded stages of hardware. Instruction processing is divided into a sequence of operations, and each operation is performed by hardware in a corresponding pipeline stage ("pipe stage"). Independent operations from several instructions may be processed simultaneously by different pipe stages, increasing the instruction throughput of the pipeline. Where a pipelined processor includes multiple execution resources in each pipe stage, the throughput of the processor can exceed one instruction per clock cycle. Contemporary superscalar, deeply pipelined processors may have anywhere from 5 to 15 pipe stages and may execute operations from as many as 4 to 8 instruction simultaneously in each pipe stage. In order to make full use of a processor's instruction execution capability, the execution resources of the processor must be provided with sufficient instructions from the correct execution path. This keeps the pipeline filled with instruction that need to be executed.
The presence of branch instructions poses major challenges to keeping the pipeline filled with instructions from the correct execution path. When a branch instruction is executed and the branch condition met, control flow of the processor is resteered to a new code sequence and the pipeline is refilled with instructions from the new code sequence. Since branch execution occurs in the back end of the pipeline, and instructions are fetched at the front end of the pipeline, several pipeline stages worth of instructions may be fetched from the wrong execution path by the time the branch is resolved. These instructions need to be flushed from the pipeline, causing bubbles (idle stages) in the pipeline. The processor must then begin fetching instructions at the target address indicated by the branch instruction, and the intervening stages of the pipeline remain empty until they are filled by instructions from the new execution path.
To reduce the number of pipeline bubbles, processors incorporate branch prediction modules at the front ends of their pipelines. When a branch instruction enters the front end of the pipeline, the branch prediction module forecasts whether the branch instruction will be taken when it is executed at the back end of the pipeline. If the branch is predicted taken, the branch prediction module communicates a target address for a new code sequence to the fetch module at the front end of the pipeline. The fetch module resteers the pipeline to begin fetching instructions at the target address.
Conventional branch prediction modules employ branch prediction tables (BPTs) that track the history (taken/not taken) of branch instructions and use this information to predict whether a branch will be taken. Looking up an instruction in the BPT, determining whether the branch is taken, and resteering the fetch module to the predicted target address consume clock cycles. This delay allows instructions from the wrong execution path to enter the pipeline. Since these instructions do not add to forward progress on the predicted execution path, they create "bubbles" in the pipeline for as many clock cycles as it takes to resteer the front end of the pipeline.
Thus, currently available branch prediction techniques reduce but do not eliminate pipeline bubbles. When these bubbles occur in selected branch instructions, such as tight loops, the performance degradation can be significant. For example, if a bubble of one cycle is introduced in a loop that executes in four clock cycles, execution of the loop may be degraded by 25%.