1. Field of the Invention
This invention relates to computer processor design.
2. Related Art
One way to achieve higher performance in computer processors employing pipelined architecture, is to keep each element of the pipeline busy. Usually, the next instruction to enter the computer pipeline is the next sequentially available instruction in program store. However, this is not the case when a change in a sequential program flow occurs (for example by execution of a control transfer instruction). In order to avoid flushing and restarting the pipeline due to changes in sequential program flow, it is desirable to select a path on which instruction execution is more likely to proceed, and to attempt to process instructions on that more likely path. This technique is known as branch prediction. If the predicted path is correct, the processor need not be unduly delayed by processing of the control transfer instruction. However, if the predicted path is not correct, the processor will have to discard the results of instructions executed on incorrect path, flush its pipeline, and restart execution on correct path.
One known prediction method is to cache, for each control transfer instruction, some history as to whether the branch was taken and the target. Each such instruction is allocated a location in a branch target buffer, each location of which includes the relevant information. While this known method generally achieves the purpose of predicting the flow of execution, it is subject to several drawbacks. First, for superscalar processors, it is desirable for instructions to be fetched in batches, such as 2 or more instructions at once, and so the branch target buffer has added complexity for having to determine the first control transfer instruction in the batch, rather than merely whether there is history for any such control transfer instruction. Second, for computers with a variable-length instruction set, instruction boundaries are not known until instructions are decoded, and so the branch target buffer would need to be coupled to the decode stage of the pipeline and this would cause pipeline flushing for each predicted taken instruction.
Accordingly, it would be advantageous to provide an improved technique for branch prediction in a processor, in which the branch target buffer is coupled to an early pipeline stage of the computer processor, and in which batches of instructions can be fetched at once without presenting unnecessary timing delays that would negatively impact the performance.