Primary processors are designed to sequentially execute instructions. A useful form of the processor is the pipeline processing architecture. In a basic pipeline processor, the pipeline includes the steps of, for example, fetching, decoding, and executing. The instructions running on the pipeline chain are conducted in an overlapped relationship with each other.
This is because, when a branch instruction is fetched, an address associated with the next instruction to be executed (fetched) at the next cycle cannot be promptly known. This causes the fetch step to be stalled and delayed until a decision for the branch target address is completed. Since a branch address is generally created in an execution step of previous instructions which generate condition codes, the fetch of the next instruction is stalled during the decoding and executing steps of the previous instructions. If the branch instruction is completely executed, it is possible to know a branch direction and safely fetch an instruction at a real target address. That is, it is determined whether the condition of a conditional branch instruction is true or not. Further, there is a time delay of one cycle, or more, until the value of the next program counter is determined in order to fetch the next instruction to be executed.
In order to utilize such a wasteful cycle, “branch prediction” is adopted. The branch prediction process predicts whether a conditional branch instruction branches, performing a pipeline. By checking whether a conditional branch instruction is true or not, the branch prediction optionally sets and progresses an address that is going to branch. Following this, the next instructions are consecutively executed while counting the address. If the branch prediction is “hit”, executed instructions will correctly be executed and the pipeline will not stall. If the branch prediction is “miss”, an address must again branch to the branch target. At this time, additional delay is created so as to flush incorrectly a false sequence of instructions and re-execute correctly a true sequence of instructions. When branch prediction fails, cycles that have been predicted and progressed become useless, which are referred to as a “branch misprediction penalty”.
As techniques for reducing the branch misprediction penalty, static branch prediction and dynamic branch prediction have been developed. In static branch prediction, “TAKEN” (branch to a branch target address) and “NOT-TAKEN” (proceed to the next instruction of a branch instruction) qualifiers of a branch instruction are checked to rearrange a program code. In dynamic branch prediction, the “TAKEN” and “NOT-TAKEN” qualifiers are determined by means of history during the program execution. Generally, a hit ratio of the dynamic branch prediction is higher than that of the static branch prediction.
As a specific method for realizing the dynamic branch prediction, the “per-address history” and “global history” schemes have been developed. The per-address history scheme has an excellent hit ratio to loop instructions (e.g., WHILE, FOR, DO, and LOOP) because each address of the branch instructions has a counter. The global history scheme has an excellent hit ratio to an adjacent branch instruction (e.g., IF-THEN). Taking into consideration the price in terms of hardware, the global history scheme is preferred to the per-address history scheme. A branch predictor based upon the global history scheme is disclosed in “Combining Branch Predictors”, Technical Note TN-36 of Western Research Laboratory, June 1993, Scott McFarling.
A conventional branch predictor is designed by considering only a branch operation to a single process. That is, a process ID is not considered. Therefore, the hit ratio of the dynamic branch prediction becomes low under the multi-processing environment where a plurality of processes are executed at the same time. If the hit ratio becomes low, the branch misprediction penalty is increased to lengthen program execution time.