1. Field of the Invention
Embodiments of the present invention relate to techniques for handling branches during execution of a computer program. More specifically, embodiments of the present invention relate to techniques for recovering from a branch misprediction.
2. Related Art
Conditional branch instructions cause a stream of execution to conditionally jump from one location to another location in a computer program. For example, when executing a conditional branch instruction, a processor typically resolves a logical condition (e.g., an “if” condition) to determine if the branch is “taken” or “not taken.” If the branch is “not taken,” the processor increments a program counter (PC) to a next instruction and continues to fetch instructions following the branch instruction. Otherwise, if the branch is “taken,” the processor sets the PC to a “target PC,” which specifies a target in the program and begins fetching instructions from the target location. Because branch instructions can cause a disruption in sequential execution of program code, branch instructions require specialized handling by the processor. Two techniques for handling such branch instructions are described below.
Multiple Circuits for Computing the Branch Target
When a processor decodes a conditional branch instruction, the resolution (“taken” or “not taken”) of the branch instruction is unknown. Hence, the processor cannot determine whether to fetch subsequent instructions directly following the branch instruction (i.e., using the PC) or from another location (i.e., using the target PC). Consequently, the processor may be forced to stall. In order to avoid such stalls, virtually all modern processors include a branch-prediction unit, which predicts whether the branch is “taken” or “not taken” based on prior resolutions of the branch instruction.
A branch-prediction unit is generally used as follows. Upon decoding a branch instruction, the processor computes the target PC and obtains a predicted resolution of the branch instruction from the branch-prediction unit. Next, while commencing execution of the branch instruction, the processor begins to fetch subsequent instructions based on the predicted resolution.
Upon completing the branch instruction, the processor determines if the actual resolution matches the predicted resolution. If so, the processor continues to fetch instructions along the predicted branch. Otherwise, if the branch is mispredicted, the processor computes the PC for the correct branch, flushes the incorrectly fetched instructions from the pipeline, and uses the computed PC to resume fetching instructions along the correct branch path. Note that because the processor must compute the branch target both while making a branch prediction (early in the pipeline) and upon determining that a branch has been mispredicted (at a later stage in the pipeline), the processor includes two sets of circuits for determining the PC and the target PC (i.e., in the branch-prediction unit and in the branch execution unit).
The Delay Slot
In some systems, at least one instruction directly following a branch instruction is guaranteed to execute. For example, in some SPARC™ systems (defined by SPARC International of Campbell, Calif., USA), a single instruction following the branch instruction (called the “delay slot”) is automatically executed. This delay slot was added when pipelines were only a few stages long, and the overhead of managing the delay slot was balanced by the useful work the processor could perform while the fetch unit was redirected to fetch instructions from the target PC. However, as pipelines have grown to include more stages, the benefits of automatically executing an instruction in the delay slot have been negated by the overhead of handling the delay slot.
Further complicating the issue of the delay slot is the “annulling branch” instruction. In some systems, this variant of the branch instruction permits the processor to annul the instruction in the delay slot when a branch is predicted “not taken.” In these systems, the instruction in the delay slot proceeds through the pipeline, but is prevented from affecting the architectural state of the processor. However, if a “not taken” prediction proves to be incorrect, the delay slot must be restored. Restoring the delay slot involves determining where the instruction from the delay slot is in the pipeline and enabling the instruction to finish executing. These operations require significant overhead.
Hence, what is needed is a branch mechanism without the above-described problems.