1. Field of the Invention
The present invention is directed to a processor which can prefetch a plurality of instructions, and especially directed to a branch prediction method in a superscalar processor.
2. Background of the Invention
A processor with a pipelined structure makes a prefetch of instructions for effective use of the pipeline. More specifically, an instruction to be executed after an instruction in execution, referred to as the next instruction, is predicted and fetched before its address is finally determined.
Sometimes, however, misprediction of the next instruction may occur especially immediately after a branch instruction. In that case, the next instruction which has already been set in the pipeline is made invalid, which degrades efficiency in the pipeline processing. Such irregularity in the pipeline due to the branches is becoming a great disadvantage in improving processor performance.
In a general scalar pipeline processor, a branch prediction method is introduced to improve prediction accuracy so that irregularity in the pipeline can be suppressed. One example is a method using a prediction table achieved by hardware to store information necessary for prediction for the next instruction.
The method includes two types: a branch history table (BHT) method for storing information only as to a branch direction (taken or not taken) such as a branch execution history; and a branch target buffer (BTB) method for storing information as to a branch target for a taken branch, such as a branch target address or a branch target instruction, as well as the above information.
FIG. 26 is a schematic diagram for explaining a branch prediction by the BTB method. An instruction cache 35 is accessed by a program counter value 31 (hereinafter referred to as a PC value, which is an address of an instruction to be cached) at an instruction fetch stage in the pipeline; at the same time, a prediction table 32 is retrieved by the PC value. If no entry corresponding to the PC value is present in the prediction table 32 and the fetched instruction is not a branch instruction, the PC value is normally incremented, for example, by 1 and the instruction is executed by an execution unit 37. Whether the fetched instruction is a branch instruction or not is determined by a decoder 36.
If no entry corresponding to the PC value is present in the prediction table 32 and the fetched instruction is a branch instruction, the execution history such as the branch target address and the branch direction is newly registered in an entry in the prediction table 32 specified by the address of the branch instruction.
If an entry corresponding to the PC value corresponding to the fetched instruction is present in the prediction table 32, the fetched instruction is determined to be a branch instruction. Thus, the branch direction is predicted on the basis of a branch direction information 322 in this entry. A prediction judging unit 33 controls operation of a selector 34, and outputs a target address 323 when the branch is predicted taken so that the target address is set in the program counter 31. When the branch is predicted not-taken, the PC value is normally incremented, for example, by 1, and the instruction is executed by the execution unit 37.
After the branch instruction is executed by the execution unit 37, a prediction table registration control unit 38 checks the executed result with the branch prediction of the prediction judging unit 33. When the prediction is incorrect, the PC value of the program counter 31 where the branch target address has already been set is made invalid to fetch the next instruction within the correct instruction stream. The prediction table registration control unit 38 also updates information in the prediction table 32 on the basis of the checked result.
Since the superscalar processor improves its performance by simultaneously issuing and executing a plurality of instructions, prefetch of instructions in the superscalar processor, compared to the scalar processor, would lead to more irregularity in the pipeline due to the branches.
The reasons are as follows: First, since complexity of logic deepens the pipeline, branch penalty caused by misprediction of the branches increases in the superscalar processor; Secondly, since a plurality of instructions are executed at the same time, the ratio of the number of cycles which stop in the pipeline due to parting of the instruction stream to the number of all cycles to be executed becomes larger. Therefore, the superscalar processor requires higher accuracy in branch prediction in comparison with the scalar processor.