1. Field of the Invention
The present invention relates to a method of predicting the target address of the branch instruction and a method of skipping some specific branch instruction for pipelines that need more than one cycle to predict branch target addresses in microprocessors and digital signal processors.
2. Description of Related Art
A configuration of multi-stage pipeline is commonly used in microprocessor and digital signal processor for executing instructions now. The pipeline comprises the stages of fetching, decoding and executing. In order to improve efficiency of execution, all stages in the pipeline are processed simultaneously. In other words, instead of starting to process the second instruction only after the first instruction has left the pipeline, while the third stage is processing the first instruction, the second stage is also processing the second instruction and the first stage is processing the third instruction simultaneously. In such case, most of the stages are idle and a big resource is wasted.
Such pipeline design runs well in the case where the instructions are sequentially executed. However, it would be a big issue when such design is applied to process the branch instructions. Once the branch instruction is taken and the program counter is jump to somewhere, it is required to flush the result of several front stages in the pipeline, such that the instruction on the branch target address can be processed. In other words, a couple of consecutive cycles are wasted in this case.
In order to reduce the waste, a technique of predicting the target address of the branch instruction has been developed accordingly, it is referred to as “branch prediction”. The object of the branch prediction is to predict the target address and the direction of the branch instruction in its earliest stage as possible as it can, wherein the direction means either “taken” or “not taken” of the branch instruction. Therefore, in case the prediction result indicates “taken of the branch instruction”, namely the process jumps to elsewhere, the pipeline can extract the instruction on the branch target address earlier. In subsequent stages executed by the pipeline, if it is determined that the predicted target address is a correct one, the result of the previous stages is reserved. Therefore, cycles are not wasted and it is possible to process multiple stages at the same time.
FIG. 1 is a schematic diagram illustrating a design of a conventional branch prediction technique, wherein each instruction is a constant of 4 bytes long. In the pipeline fetch stage, the value of the program counter (PC) 101 is provided to an instruction cache 102 for fetching a current instruction and a prediction unit 104 for creating indices, which are then used for searching the previously recorded related information. If the related information exist, which indicates the current instruction is a branch instruction, the prediction unit 104 predicts and provides a direction 106 and a target address 105 of the branch instruction to a multiplexer 107. Meanwhile, an adder 103 counts an address of a consecutive instruction, that is a summation of the value of the program counter 101 and the constant instruction length is generated, and then the summation result is provided to the multiplexer 107. If the direction 106 predicted by the prediction unit 104 is “not taken of the branch instruction”, the multiplexer 107 feeds a next consecutive instruction address output from the adder 103 into the program counter 101 where it is used as the address of next fetch instruction, and the process is continuously executed. Otherwise, the multiplexer 107 feeds the target address 105 predicted by the prediction unit 104 into the program counter 101 where it is used as the address of next fetch instruction, namely the instruction subsequent to the current branch instruction is fetched in advance. Under the premise of faultless prediction, regardless of whether the branch is taken or not, the pipeline can fetch an instruction from a correct address every cycle.
Advantages of branch prediction are described in detail with reference to FIG. 2 and 3 hereinafter. FIG. 2 schematically shows how the pipeline having no branch prediction capability executes the instruction. It is assumed that the pipeline includes 5 stages from F1 to W5. At the 4th cycle, after the branch instruction BC4 is fetched by the pipeline, since it is not possible to predict the address of the instruction to be executed next, the front end of the pipeline is idle. The instruction is not fetched until the 6th cycle, where the instruction BC4 has passed the execution stage E3 and the address of the next instruction T5 is available. As shown in the diagram, after the instruction BC4 is fetched, the pipeline has idled for two cycles in such case.
Compared to FIG. 2, FIG. 3 schematically shows how the pipeline having branch prediction capability executes the instruction. It is assumed that the pipeline also includes 5 stages from F1 to W5. At the 4th cycle, after the branch instruction BC4 is fetched by the pipeline, meanwhile the address of the next instruction T5 had been predicted, thus the instruction T5 is directly fetched by the pipeline at the 5th cycle. At the 6th cycle, when the instruction BC4 has passed the execution stage E3 and it is verified that the address for the next instruction is correctly predicted, the front end result of the pipeline is reserved and subsequent instructions can continue their execution. Therefore, no idle stage exist in the pipeline, and a maximum efficiency is achieved in such case.
The example shown in FIG. 3 assumes that only one cycle is required for branch prediction. However, along with the complicated mechanism of the continuously developed branch prediction, new issue arises. As shown in FIG. 4, it is assumed that the pipeline includes 7 stages from F1 to W7, and it takes two cycles to predict the branch target address. At the 4th cycle, when the branch instruction BC4 enters into F1 stage, meanwhile the prediction unit starts to predict the target address of the branch instruction BC4. Since two cycles are required for the prediction, the pipeline is unable to obtain the predicted target address of BC4 at the beginning of the 5th cycle, therefore the sequential instruction i5 is fetched. The pipeline will be unaware that the target address of BC4 is T9 until the 5th cycle has been totally completed, and T9 is fetched only when the 6th cycle begins. Therefore, the cycles spent in fetching the instruction i5 previously are wasted in such an ineffective method.
U.S. Pat. No. 6,622,240 discloses a method for resolving the problems mentioned above. In this method, the calculation required to obtain the target address of the branch instruction is copied first, and the copied version of the calculation is provided to the pipeline as a pre-branch instruction, such that the pre-branch instruction can be processed by the pipeline in advance. After the pipeline has completed its fetching of the branch instruction, the calculation of the pre-branch instruction is also completed. Meanwhile, the target address of the branch instruction is available, and the instruction on the target address can be fetched when next cycle is just started. In this method, after the branch instruction is fetched, there is no idle stage in the pipeline. However, at least one cycle is required for the pipeline to process the derived pre-branch instruction before it is fetched, thus there is still some wasting of useful resources.
It is known from description above, a new method is demanded in order to further reduce idle and waste of the pipeline which needs more than one cycle to predict its branch target address.