1. Technical Field
The present invention relates in general to a method and system for data processing and in particular to a method and system for executing instructions within a processor. Still more particularly, the present invention relates to a method and system for executing instructions within a processor such that the branch misprediction penalty incurred when a branch is incorrectly predicted as taken is minimized.
2. Description of the Related Art
A conventional high-performance processor includes an instruction cache for storing instructions, an instruction buffer for temporarily storing instructions fetched from the instruction cache for execution, a number of execution units for executing sequential instructions, a branch processing unit for executing branch instructions, a dispatch unit for dispatching sequential instruction from the instruction buffer to particular ones of the executions units, and a completion buffer for temporarily storing instructions that have finished execution, but have not been completed.
As is well-known in the art, sequential instructions fetched from the instruction queue are stored within the instruction buffer pending dispatch to the execution units. In contrast, branch instructions fetched from the instruction cache are typically forwarded directly to the branch processing unit for execution. In some cases, the condition register value upon which a conditional branch depends can be ascertained prior to executing the branch instruction, that is, the branch can be resolved prior to execution. If a branch is resolved as taken prior to execution, instructions at the target address of the branch instruction are fetched and executed by the processor. In addition, any sequential instructions following the branch that have been prefetched are discarded. However, the outcome of a branch instruction often cannot be determined prior to executing the branch instruction due to a condition register dependency. When a branch instruction remains unresolved at execution, the branch processing unit utilizes a prediction mechanism, such as a branch history table, to predict which execution path should be taken. In conventional processors, the dispatch of sequential instructions following a branch predicted as taken is halted and instructions within the speculative target instruction stream are fetched during the next processor cycle. If the branch that was predicted as taken is resolved as mispredicted, a mispredict penalty is incurred by the processor due to the cycle time required to restore the sequential execution stream following the branch instruction.
Referring now to FIGS. 4a-4b, there is depicted an example illustrating the mispredict penalty incurred when a branch instruction is incorrectly predicted as taken. In FIG. 4a, an instruction sequence is illustrated which includes a conditional branch instruction (BC) that branches to a target instruction (TO) based upon a condition register value generated by a compare instruction (CMP). The instruction sequence depicted in FIG. 4a also includes 4 sequential instructions S0-S3. A timing diagram depicting the execution of the instruction sequence within a conventional processor having a fetch bandwidth of 4 instructions and a dispatch bandwidth of 2 instructions is illustrated in FIG. 4b.
In cycle 1 of FIG. 4b, instructions S0, CMP, S1, and BC are fetched from the instruction cache and stored within the instruction buffer. During cycle 2, the 4 subsequent sequential instructions (S2, S3, S4, and S5) are fetched while instructions SO and CMP are dispatched to the execution units for execution. In addition, the conditional branch BC is predicted as taken in cycle 2. Consequently, target instructions T0 and T1 are fetched in cycle 3. During cycle 3, the branch instruction also resolves incorrectly since CMP finishes execution during the cycle. Because BC was predicted as taken in cycle 2, only sequential instruction preceding BC are dispatched in cycle 3. Since the correct current fetch address is not restored until cycle 4, the correct sequential instructions cannot be executed by the execution units until cycle 6. Thus, as illustrated in FIG. 4b, the processor incurs a mispredict penalty between the execution of sequential instructions S1 and the execution of sequential instructions S2 and S3. The mispredict penalty, which is defined as the number of cycles that the execution units are idle or executing instructions within the mispredicted path, delays the execution of S2 by two cycles and the execution of S3 by one cycle, resulting in an average mispredict penalty of 1.5 cycles. A half cycle penalty is incurred during cycle 4 since only one instruction is executed out of the two instructions that could be executed during that cycle.
Because of the performance penalty associated with the misprediction of an unresolved branch as taken, it would be desirable to provide an improved method and system for executing instructions that minimize the branch misprediction penalty incurred in cases in which a branch is incorrectly predicted as taken.