This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2000-053820, filed Feb. 29, 2000, the entire contents of which are incorporated herein by reference.
The present invention relates to a processor having a branch prediction capability of a conditional branch instruction and a branch prediction method of the same processor.
During program execution on a pipelined processor, a conditional branch instruction is executed when a given condition for this instruction becomes true. There may be the case where an instruction is previously inserted into a pipeline and is not executed yet. In this case, while a new instruction at the branch destination is inserted into the pipeline, the processor cannot perform effective operations, degrading throughput. This phenomenon is called xe2x80x9cbranch penaltiesxe2x80x9d.
The following describes how a branch penalty occurs using an example of a RISC processor having a 5-stage pipeline.
The 5-stage pipeline for an ordinary RISC processor comprises:
F stage (fetching instructions)
D stage (decoding instructions)
E stage (executing instructions)
M stage (accessing memory)
W stage (writing to a register file)
FIG. 6 shows an instruction sequence executed on this 5-stage pipeline. In this case, the pipeline presents states as shown in FIG. 7.
With respect to the instruction sequence in FIG. 6, an instruction xe2x80x9cBREQ R2, R4, LABEL1xe2x80x9d is executed in the E stage at a given cycle N It is judged whether there is an agreement (true) or not (false) between an R2 register value and an R4 register value. When the branch condition is true, a branch is successful, while the branch condition is false, a branch is unsuccessful.
In this cycle N, the above BREQ instruction is followed by an instruction xe2x80x9cLW R8, (R9)xe2x80x9d at the D stage and an instruction xe2x80x9cAND R11, R8, R7xe2x80x9d at the F stage.
When a BREQ result is false in the previous cycle N, the pipeline processing may continue in the cycle N+1. When a BREQ result is true as shown in FIG. 7, however, the processor cancels the instruction xe2x80x9cLW R8, (R9)xe2x80x9d at the D stage and the instruction xe2x80x9cAND R11, R8, R7xe2x80x9d at the F stage in the previous cycle N. The processor must begin with a fetch operation anew from the instruction xe2x80x9cAND R11, R8, R7xe2x80x9d at the branch destination. Namely, in this example, no instructions are executed during two cycles after the BREQ instruction, wasting cycles.
While the above example uses the 5-stage pipeline, improvement of processor frequencies requires more pipeline stages. As the number of pipeline stages increases, branch penalties also tend to increase.
Dynamic branch prediction may be used for decreasing branch penalties. This method is used for predicting branch condition values in the future according to true or false values for the conditions in the past. Generally, when a condition was often true in past conditional branches, that condition is predicted to be true. When a condition was often false in past conditional branches, that condition is predicted to be false. Basically, a saturation counter implements a system for reflecting a tendency for past conditional branches on the branch prediction. For example, the system using a 2-bit saturation counter implements a state transition as shown in FIG. 8.
In the state transition diagram of FIG. 8, an arrow marked with TRUE shows state transition when a conditional branch becomes true. An arrow marked with FALSE shows state transition when a conditional branch becomes false. Each time a conditional branch instruction is executed to provide a true or false value, this value controls state transition among four states: SN (Strongly Not Taken), WN (Weakly Not Taken), WT (Weakly Taken), and ST (Strongly Taken). For predicting whether the conditional branch instruction becomes true or false, the conditional branch is predicted to be true when the saturation counter state is WT or ST. It is predicted to be false when the saturation counter state is SN or WN.
In this case, branch prediction is performed according to a true or false result of the conditional branch. This is called a Taken mode.
Ideally, there would be provided saturation counter hardware for each of all conditional branch instructions. However, this is not practical from the viewpoint of costs. Generally, there is provided a specified number of saturation counters in table formats. Each conditional branch instruction is associated with a table entry by using, say, a hash function which uses a conditional branch instruction address as an input. This method can limit the number of hardware resources. However, there is the problem that a conflict occurs between two or more conditional branch instructions that use the same table entry, namely, the same saturation counter. In this case, a conflict means that the same saturation counter is assigned with a conditional branch instruction with a great possibility of being true and another with a great possibility of being false.
To solve this conflict, there is provided an Agree mode which uses a compiler for conditional branch prediction during state transition of saturation counters.
In the Agree mode, the compiler predicts a true or false condition value for each conditional branch instruction during program compilation. According to a prediction result, the compiler sets a tag value to be added to the corresponding instruction. Alternatively, the compiler selectively uses two types of instructions: one predicting the conditional branch to be true and the other predicting the conditional branch to be false. By doing so, the compiler notifies the processor of a prediction result concerning the true or false condition value for each conditional branch instruction. The Agree mode differs from the above-mentioned mode which provides state transition based on a true or false result of the conditional branch. The saturation counter in the Agree mode provides state transition based on whether the conditional branch (true or false) agrees with the compiler prediction (true or false). For example, a 2-bit saturation counter provides state transition as shown in FIG. 9.
In the state transition diagram of FIG. 9, an arrow marked with AGREE indicates state transition when the result agrees with the compiler prediction. An arrow marked with DISAGREE indicates state transition when the result does not agree with the compiler prediction. The conditional branch instruction is executed to determine the true or false value. Each time the compiler prediction is agreed or disagreed, the agreed or disagreed result controls state transition among four states: SD (Strongly Disagree), WD (weakly Disagree), WA (Weakly Agree), and SA (Strongly Agree). The conditional branch instruction is predicted to be true or false as follows. When the saturation counter state is WA or SA, the conditional branch agrees with the compiler prediction. Namely, when the compiler predicts a result to be true, the true result is predicted. When the compiler predicts a result to be false, the false result is predicted. When the saturation counter state is SD or WD, the conditional branch does not agree with the compiler prediction. Namely, when the compiler predicts a result to be true, the false result is predicted. When the compiler predicts a result to be false, the true result is predicted.
The use of this Agree mode can solve the problem of conflicts and provide the same effect as practically increasing the number of entries. As mentioned above, a conflict occurs when the same saturation counter is assigned with a conditional branch instruction with a great possibility of being true and another with a great possibility of being false. In this case, the saturation counter can be the same entry in a table constituting the saturation counter.
Apart from the problem of branch conflicts, however, the Agree mode still needs to improve branch prediction accuracy.
It is an object of the present invention to provide a processor which can eliminate problems such as saturation counter contamination, improve prediction accuracy for conditional branch instructions, and thus improve processor throughput.
According to one aspect of the present invention, there is provided a processor comprising: an instruction cache configured to fetch a first instruction from a program stored in a memory; a decoding section, connected to the instruction cache, configured to decode the first instruction fetched in the cache; an executing section, connected to the decoding section, configured to execute the decoded first instruction; and an address computing section, connected between the instruction cache and the execute section, configured to compute an address of a second instruction to be fetched into the instruction cache when the decoded first instruction is a branch instruction, the address computing section including: a section configured to detect a result of a former branch prediction of the first instruction; a section configured to compute the address according to the result of the former branch prediction unless the result represents an impossibility of prediction; a section configured to confirm a successful prediction by comparing the result of the former branch prediction with a branch condition result ascertained from an execution of the first instruction; and a register in which the computed address is set when the successful prediction is confirmed.
According to another aspect of the present invention, there is provided a branch prediction method of predicting an address of a second instruction to be fetched into an instruction cache when a first instruction previously fetched into the cache is a branch instruction, the method comprising: detecting a result of a former branch prediction of the first instruction; computing the address according to the result of the former branch prediction unless the result represents an impossibility of prediction; confirming a successful prediction by comparing the result of the former branch prediction with a branch condition result ascertained from an execution of the first instruction; and outputting the computed address to the cache when the successful prediction is confirmed.