The present invention relates generally to branch prediction in pipelined processors, and in particular to a system and method of independently flushing two segments of the same pipeline at different times to maximize performance and efficiently correct mispredicted branch instructions.
Most modern processors employ a pipelined architecture, where sequential instructions, each having multiple execution steps, are overlapped in execution. For maximum performance, the instructions should flow continuously through the pipeline. However, instructions often become stalled in the pipeline for a variety of reasons, such as data dependencies between instructions, delays associated with memory accesses, an inability to allocate sufficient pipeline resources to instructions, and the like. Minimizing pipeline stalls and resolving them efficiently are important factors in achieving improved processor performance.
Most real-world programs include conditional branch instructions, the actual branching behavior of which is not known until the instruction is evaluated deep in the pipeline. Most modern processors employ various forms of branch prediction, whereby the branching behavior of conditional branch instructions is predicted early in the pipeline, and the processor fetches and speculatively executes instructions, speculatively allocating pipeline resources to them, based on the branch prediction. When the actual branch behavior is determined, if the branch was mispredicted, the speculatively fetched instructions must be flushed from the pipeline, speculatively allocated resources must be un-allocated and returned to their state prior to the branch prediction, and new instructions must be fetched from the correct branch target address.
On one hand, the pipeline should ideally be flushed immediately upon detecting the misprediction, so that the correct instructions may be fetched and launched into the pipeline, minimizing the delay caused by the branch misprediction. On the other hand, instructions fetched from the erroneous branch path may be in various stages of speculative execution, and may have been speculatively allocated various processor resources. Immediately “unwinding” these allocations to restore the resources to their pre-branch-prediction state is difficult, and may incur numerous processor cycles and/or require numerous duplicative copies of the speculatively-allocated resources. The penalties incurred by immediately flushing the pipeline are further exacerbated in processors that support out of order instruction execution, such as superscalar processors, due to the additional challenge of tracking relative instruction age. This instruction age tracking is necessary to ensure that only those instructions that were fetched after the mispredicted branch instruction (in program order) are flushed, and that all instructions ahead of the branch instruction (in program order) are executed, even though they may be behind the branch instruction in the pipeline.