1. Technical Field
This invention relates to microprocessors, and in particular to systems for processing branch instructions.
2. Background Art
Advanced processors employ pipelining techniques to execute instructions at very high speeds. In a pipelined processor, the overall machine is organized as a pipeline consisting of several cascaded stages of hardware. Instruction processing is divided into a sequence of operations, and each operation is executed by hardware resident in a corresponding pipeline stage ("pipe stage") in a single cycle of the processor clock. Independent operations from several instructions may be processed simultaneously by different pipe stages, increasing the instruction throughput of the pipeline. Where a processor pipeline includes multiple execution resources in each pipe stage, the throughput of the processor can exceed one instruction per clock cycle. Contemporary superscalar, deeply pipelined processors may have anywhere from 5 to 15 pipe stages and may execute operations from as 4 to 8 instructions simultaneously in each pipe stage.
In order to make full use of a processor's instruction execution capability, the processor must be provided with sufficient instructions from the correct execution path. As long as the correct execution path can be identified, instructions from this execution path can be loaded into the processor pipeline to keep the execution resources busy. Where program instructions are processed sequentially, it is a relatively simple matter to determine the correct execution path. Branch instructions can disrupt sequential execution by transferring control of the processor to a non-sequential target address when an associated branch condition is met. Many programs have branches every five or six instructions. As a result, a deeply pipelined processor may have two or three branch instructions in its pipeline at a given time, making determination of the correct execution path difficult. Moreover, branch conditions are typically not resolved until the back end of the processor pipeline, so the pipeline may begin processing instructions from incorrect execution paths before the error is discovered.
Processors typically include branch prediction systems at the front end of their pipelines to anticipate changes in the control flow due to taken branch instructions. Branch prediction systems use a variety of methods to predict whether a branch instruction entering the front end of the pipeline is likely to be taken when it is executed at the back end of the pipeline, e.g. whether the branch condition is likely to be met. For branch instructions that are predicted taken, instructions beginning at the associated target address may be loaded into the pipeline behind the branch instruction. As long as the branch is resolved taken when it is executed at the back end of the pipeline, the predicted instruction sequence that follows the branch instruction is from the correct execution path, and there is no disruption of the pipeline's operation. If the prediction is incorrect, the predicted instructions are not from the correct execution path. They must be flushed from the pipeline and instructions from the correct instruction path loaded.
Instructions from a predicted branch path must thus be checked at the back end of the pipeline and either validated or corrected. Typically, this is done by comparing the target address and branch condition from the executed branch instruction with the predicted target address and branch condition. When the comparisons match, no action need be taken since the instructions in the pipeline following the branch instruction represent the correct control flow. When the comparisons do not match, the pipeline must be flushed and reloaded with instructions from the correct execution path.
Validating branch predictions can consume additional clock cycles. For example, the branch information from the executed branch instruction is resolved in one stage of the pipeline, and typically compared with the predicted branch information no earlier than the next stage of the pipeline. In processors that support predication, branch conditions are frequently represented by predicates, and predicate evaluation is a critical path in the processor. Delays in validating predicted predicates can lengthen a critical timing path in the processor pipeline.
This problem is exacerbated in processors that execute code compiled by trace scheduling, superblock scheduling, and hyper block scheduling. These methods cause fall-through, i.e. not taken, branches to cluster at the end of a scheduled code block. The clustered branch instructions are generally executed and validated in sequence. Fall through branches do not effect the control flow of the processor and each one that is executed delays the pipeline by an additional clock cycle. Further, delays due to validating each fall through branch are compounded as well. The present invention addresses these and other problems associated with executing and validating branch instructions.