This invention is generally related to a method and system for high performance implementation of a microprocessor and more particularly, to branch instructions in a pipeline process of a microprocessor.
In a field of computer technologies, many excellent technologies have developed during the past few decades.
Particularly, microprocessors have progressed a quantum leap over other fields of technology. The microprocessors have achieved phenomenally high performance, high speed and so on. One way to improve even more on the performance of the microprocessor, is to reduce the number of cycles required to execute an instruction by overlapping the execution of multiple instructions.
Referring to FIG. 1, a pipeline process simultaneously executes overlapped multiple instructions. Consequently, the pipeline process is capable of starting an execution of the next instruction before the execution of one instruction finishes.
As the above process, there are singlescalar and superscalar techniques that are capable of carrying out overlapped multiple instructions by plural pipeline.
The above-noted superscalar technique is a high performance implementation technique of microprocessors that simultaneously executes overlapped portions of plural instructions within one clock in one machine cycle.
Some of the technologies used to improve on the superscalar technique are referred to as registration rename, out-of-order execution, branch instructions, and speculative execution and so on.
However, a disadvantage of these techniques is that a spoiled pipeline, called pipeline bubble, can occur. Referring to FIG. 2, the instruction pipelines of the above types that are single scalar and superscalar typically have some branch instructions. The branch instructions usually delay the instruction pipeline because of the following reasons.
The first reason is that the processor must decide the branch condition. However, the microprocessor can not decide the branch condition until an instruction for deciding a condition, such as the instruction for changing a zero flag, finishes. Particularly, the decision regarding the condition can be extremely delayed in the superscalar technique because the superscalar simultaneously issues a lot of instructions.
The second reason is that the processor must also calculate the effective destination of the branch and fetch the instruction. When a cache access requires an entire cycle, and the fetched branch instruction specifies the target address, performance of this fetch without delaying the pipeline for at least one pipe stage is impossible. Furthermore, conditional branches may cause further delays because they require the calculation of a condition, as well as the target address. Therefore, branch instructions are delayed and do not take effect until after one or more instructions immediately following the branch instructions have been executed.
The calculation of the branch target address finishes within about one cycle. Therefore, the above-described first reason becomes a problem in comparison with the second reason. In the decision of the branch condition, speculative execution becomes more effective. Therefore, this speculative execution has great value in the superscalar.
A technology used for solving the above described pipeline bubble is referred to as a branch prediction technique. When the branch instructions exist, pipeline bubbles inevitably occur in the pipeline process because of the above reasons.
Some methods of branch prediction based upon conditional branch instructions are disclosed in Japanese Laid Open patent applications No. 63-147,230, 01-239,638 and 04-112,327. These methods of branch prediction predict whether to perform the conditional branch instruction according to the last occurrence of the conditional branch instruction and based thereon, either execute the conditional branch instruction or do not execute the conditional branch instruction before the judgement is made whether the condition for the conditional branch instruction is satisfied. In detail, the microprocessor system has a branch history table that stores and pairs the branch target address for destination of the branch instruction with the address corresponding to the conditional branch instruction according to the last occurrence of the conditional branch instruction. When the above conditional branch instruction is re-executed, the microprocessor system carries out the branch instruction before calculation of the target address for destination of the branch instruction by using the stored address for destination of the branch instruction stored in the branch history table.
In the conditional branch instruction and also non-condition branch instruction, it is necessary to perform a process which adds an address for destination of the branch instruction which is a relative address to a value of a program counter, in order to obtain an actual address for destination of the branch instruction so that the target address for destination of the branch instruction is a relative address.
On the other hand, if the microprocessor system employs an absolute address, the above calculation is not necessary. However, the microprocessor system still needs the above branch history table in order to store the absolute address for destination of the branch instruction.
The above-described branch prediction is capable of executing high-speed operation if the branch prediction matches a suitable branch instruction, namely if the branch instruction is taken.
However, the present inventor identified that if the branch prediction is not taken, the system needs to invalidate the executed instruction after identifying that a predicted branch instruction is not to be taken. The invalidating operation requires machine cycles. Therefore, the above invalidating operation inhibits the efficiency of the microprocessors.
Furthermore, the present inventor also identified that the conventional branch prediction techniques have a branch history table which stores predicted values which indicate the last occurrence of the conditional branch instructions for all conditional branch instructions and branch target addresses when the conditional branch instruction is executed. Therefore, hardware of the system becomes large scale, and also expensive. Furthermore, when a microprocessor system does not execute the branch prediction, although the system does not become expensive, the process speed becomes low in comparison with the processor in which the branch prediction is executed. This is because the microprocessor system cannot execute the next instructions until it becomes clear whether the condition for performing the conditional branch instruction has been satisfied.
The above mentioned branch instructions are disclosed, for example, in xe2x80x9cADVANCED COMPUTER ARCHITECTURES, a design space approach, p272-p360, Deszo Sima et al, Addison Wesleyxe2x80x9d. The contents of this reference being incorporated herein by reference.
To solve the above and other problems, according to one aspect of the present invention, a method for processing branch instructions in a pipeline process of a microprocessor system has the steps of determining whether a conditional branch instruction code correspond to branch prediction and executing branch prediction if the conditional branch instruction code corresponds to branch prediction.
According to another aspect of the present invention, the method has a further step of suspending execution of successive instruction until a branch evaluation of the conditional branch instruction finishes, if said conditional branch instruction code does not correspond to branch prediction.
According to another aspect of the present invention, the method further comprises the steps of: assuming a branch address data in said conditional branch instruction to be an actual branch target address in the case that the conditional branch instruction code corresponds to branch prediction; and assuming an address which is the sum of the branch address data and a count value of a program counter to be an actual branch target address in the case that the conditional branch instruction code does not correspond to branch prediction.
According to another aspect of the present invention, a method for processing branch instructions in a pipeline process of a microprocessor system comprises the steps of determining whether a conditional branch instruction code corresponds to branch prediction according to a prescribed bit in the conditional branch instruction code, which indicates whether branch prediction is effective, and executing branch prediction if said predicted bit corresponds to branch predictions, and suspending execution of successive instructions until a branch evaluation of the conditional branch instruction finishes if the conditional branch instruction code does not correspond to branch prediction.
According to another aspect of the present invention, a microprocessor system which processes branch instructions in a pipeline process, includes a branch prediction unit configured to detect a prescribed bit corresponding to effective branch prediction code in a conditional branch instruction code and determine whether the conditional branch instruction code corresponds to a branch prediction code according to the prescribed bit corresponding to the effective branch prediction code, and a branch prediction controller coupled to the branch prediction unit and configured to execute branch prediction if the conditional branch instruction code corresponds to branch prediction.
According to another aspect of the present invention, the branch prediction controller is configured to suspend successive instructions until a branch evaluation of the condition branch instruction finishes, if the conditional branch instruction code does not correspond to branch prediction.
According to another aspect of the present invention, the branch prediction controller is configured to assume that branch address data in the conditional branch instruction code is an actual branch target address in the case that the conditional branch instruction code corresponds to branch prediction and assume that an address which is a sum of the branch address data to a count value of a program counter is an actual branch target address in the case that the conditional branch instruction code does not correspond to branch prediction.
According to another aspect of the present invention, the conditional branch instruction code includes a branch prediction effective bit which corresponds to whether a branch prediction unit should predict the branch target address.