1. Field of the Invention
This invention is related to the field of processors and, more particularly, to branch instruction processing in processors.
2. Description of the Related Art
Over time, processors have been produced with increased operating frequencies. Reaching the operating frequencies of modern processors has been accomplished, at least in part, using deeper instruction processing pipelines. Generally, a pipeline is designed by dividing the work to be performed (e.g. instruction processing, in this case) into multiple portions, each of which is assigned to a given pipeline stage. Instructions pass from stage to stage in the pipeline. Pipeline stages can operate on different instructions in parallel, thus overlapping the processing of instructions in time. Deep pipelines have many such stages. For example, pipelines having 10 pipeline stages, 15 pipeline stages, or even more are often found in modern processors. A pipeline having 10 stages can have up to 10 different instructions at various stages of processing. Some stages process more than one instruction at once (e.g. superscalar designs, or fetch stages that fetch multiple instructions at once). Thus, even more instructions can be in the pipeline at various points. Additionally, buffers may be included at various points in the pipeline to buffer instructions, which further increases the number of instructions that can be outstanding in the pipeline.
While the high operating frequencies of deeply pipelined processors can improve overall performance, deep pipelines also present various challenges to a successful design. For example, branch instructions are such a challenge. A branch instruction determines which instructions will be the next to be executed in a given sequence, either those sequential to the branch instruction in memory or those stored at an address specified by the branch instruction or operands of the branch instruction. The branch instruction is executed and the target identified (sequential or non-sequential). The instructions at the target can then be fetched for execution. The branch is referred to as “not taken” if the sequential instructions are the target and “taken” if the non-sequential instructions are the target. Some branch instructions are conditional, in which the taken/not taken result depends on one or more operands. Other branch instructions are unconditional, and always select the non-sequential address.
The execution stages are generally at or near the end of the pipeline, and thus the branch result is not known until late in the pipeline. To fill earlier pipeline stages with instructions, the branch result is typically predicted. If the prediction is correct, the target instructions are already being processed when the branch result is known. If the prediction is incorrect, the wrong instructions are in the pipeline and must be discarded. The instructions from the correct target can then be fetched, but performance has been lost in the interim. For deep pipelines, this performance loss can be especially severe. Accordingly, deep pipeline processors often focus a great deal of design effort and/or hardware on accurately predicting branches. Unfortunately, no prediction mechanism is perfect and, with deep pipelines, even small mispredict rates can result in substantial losses.