1. Field of the Invention
This invention relates to the field of microprocessors and, more particularly, to branch prediction mechanisms within microprocessors.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term "instruction processing pipeline" is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
An important feature of a superscalar microprocessor (and a superpipelined microprocessor as well) is its branch prediction mechanism. The branch prediction mechanism indicates a predicted direction (taken or not-taken) for a branch instruction, allowing subsequent instruction fetching to continue within the predicted instruction stream indicated by the branch prediction. A branch instruction is an instruction which causes subsequent instructions to be fetched from one of at least two addresses: a sequential address identifying an instruction stream beginning with instructions which directly follow the branch instruction; and a target address identifying an instruction stream beginning at an arbitrary location in memory. Unconditional branch instructions always branch to the target address, while conditional branch instructions may select either the sequential or the target address based on the outcome of a prior instruction. Instructions from the predicted instruction stream may be speculatively executed prior to execution of the branch instruction, and in any case are placed into the instruction processing pipeline prior to execution of the branch instruction. If the predicted instruction stream is correct, then the number of instructions executed per clock cycle is advantageously increased. However, if the predicted instruction stream is incorrect (i.e. one or more branch instructions are predicted incorrectly), then the instructions from the incorrectly predicted instruction stream are discarded from the instruction processing pipeline and the number of instructions executed per clock cycle is decreased.
In order to be effective, the branch prediction mechanism must be highly accurate such that the predicted instruction stream is correct as often as possible. Typically, increasing the accuracy of the branch prediction mechanism is achieved by increasing the complexity of the branch prediction mechanism. For example, a cache-line based branch prediction scheme may be employed in which branch predictions are stored with a particular cache line of instruction bytes in an instruction cache. A cache line is a number of contiguous bytes which are treated as a unit for allocation and deallocation of storage space within the instruction cache. When the cache line is fetched, the corresponding branch predictions are also fetched. Furthermore, when the particular cache line is discarded, the corresponding branch predictions are discarded as well. The cache line is aligned in memory. A cache-line based branch prediction scheme may be made more accurate by storing a larger number of branch predictions for each cache line. A given cache line may include multiple branch instructions, each of which is represented by a different branch prediction. Therefore, more branch predictions allocated to a cache line allows for more branch instructions to be represented and predicted by the branch prediction mechanism. A branch instruction which cannot be represented within the branch prediction mechanism is not predicted, and subsequently a "misprediction" may be detected if the branch is found to be taken. However, complexity of the branch prediction mechanism is increased by the need to select between additional branch predictions. As used herein, a "branch prediction" is a value which may be interpreted by the branch prediction mechanism as a prediction of whether or not a branch instruction is taken or not taken. Furthermore, a branch prediction may include the target address. For cache-line based branch prediction mechanisms, a prediction of a sequential line to the cache line being fetched is a branch prediction when no branch instructions are within the instructions being fetched from the cache line.
A problem related to increasing the complexity of the branch prediction mechanism is that the increased complexity generally requires an increased amount of time to form the branch prediction. For example, selecting among multiple branch predictions may require a substantial amount of time. The offset of the fetch address identifies the first byte being fetched within the cache line: a branch prediction for a branch instruction prior to the offset should not be selected. The offset of the fetch address within the cache line may need to be compared to the offset of the branch instructions represented by the branch predictions stored for the cache line in order to determine which branch prediction to use. The branch prediction corresponding to a branch instruction subsequent to the fetch address offset and nearer to the fetch address offset than other branch instructions which are subsequent to the fetch address offset should be selected. As the number of branch predictions is increased, the complexity (and time required) for the selection logic increases. When the amount of time needed to form a branch prediction for a fetch address exceeds the clock cycle time of the microprocessor, performance of the microprocessor may be decreased. Because the branch prediction cannot be formed in a single clock cycle, "bubbles" are introduced into the instruction processing pipeline during clock cycles that instructions cannot be fetched due to a lack of a branch prediction corresponding to a previous fetch address. The bubble occupies various stages in the instruction processing pipeline during subsequent clock cycles, and no work occurs at the stage including the bubble because no instructions are included in the bubble. Performance of the microprocessor may thereby be decreased.