The present invention relates to an instruction branching for a processor in which instruction fetch and instruction execution are carried out in parallel in a pipeline processing.
The branch operation in conventional pipeline processors to be described below is detailed in "Computer Architecture: A Quantitative Approach", David A. Patterson and John L. Hennessy, Morgan Kaufmann Publishers, Inc., 1990, in particular, in chapter 6.
In a general pipeline processor, execution of instruction and fetch of a subsequent instruction are performed in parallel in order to speed up the overall operation. However, when a branch instruction is executed and then a branch takes place, since the simultaneously fetched instruction subsequent to the branch instruction is not executed, the fetch of the subsequent instruction is of no use. Further, since a branch target instruction is determined only after the execution of the branch instruction and based on the result of such execution, it takes time until a branch target instruction is fetched. In a conventional pipeline processor, the impossibility of performing in parallel a useful subsequent instruction fetch and a branch instruction has been a major factor in causing processor performance problems.
Therefore, conventionally, by executing one instruction subsequent to the branch instruction, effective utilization of the subsequent instruction fetch is made. This is called delayed branching since the branch operation is executed apparently after executing the instruction subsequent to the branch instruction. By improving the logic design of the processor, it is possible to calculate a branch target address during the subsequent instruction fetch. Thereby, it becomes possible to carry out a branch target instruction fetch immediately after executing the subsequent instruction. An efficient branch operation has been done even in a pipeline processor through such an arrangement. However, there are the following problems.
In an operating system that performs process management, there is a branch instruction which calculates the branch target address dynamically using values in registers or the like. However, when such a complicated branch target address calculation is conducted, there arises a problem that the calculation may not be completed within a limited time during which a single subsequent instruction Is fetched in parallel.
It is a prerequisite that the subsequent instruction can be executed whether the branch is taken or not. Finding such instructions is not always possible in some programs, particular when the branch instruction thereof is a conditional branch instruction.
There has been increasing use of parallel execution in units of one instruction called a super-scalar processor or a VLIW (Very Long Instruction Word) processor, in which instruction execution pipelines are provided in a processor and instructions are simultaneously fetched and parallelly executed. Since a branch instruction and the instruction subsequent thereto are often fetched simultaneously, it is impossible to hide the execution time of the branch instruction with the fetch time of the subsequent instruction as in the delayed branch method. As a result, a period of time from fetching (plural instructions included) a branch instruction until fetching a branch target instruction to be executed subsequently is markedly long, thereby causing a significant problem.
A conventional solution to such problems has been a branch prediction. There are two types of branch prediction, a static branch prediction and a dynamic branch prediction. According to the static branch prediction, whether or not a branch will occur is predicted based on static information about the type of the branch instruction and about whether the branch target is forward or backward, and so forth. According to the dynamic branch prediction, a history is recorded to some extent as to whether or not a branch was made in the past by the branch instruction, and on the basis of the history it is predicted through a predetermined algorithm whether or not a branch will occur.
An instruction fetch section within the processor detects whether or not there exists any branch instruction in fetched instructions. When there exists any branch instruction, it is determined in accordance with the branch prediction whether the instruction to be fetched next is the subsequent instruction or the branch target instruction. The detection of the branch instruction by the instruction fetch section should be fast enough to be completed before another instruction fetch occurs immediately thereafter.
The conventional branch prediction is provided with a table using a content addressable memory (CAM) storing the address of a branch instruction in pair with the corresponding branch target address, and thereby retrieval on content or part of content can be made. Namely, in the first execution of each branch instruction, the address where the branch instruction is placed is recorded together with the branch target address obtained as a result of the execution, and every time a new instruction is fetched, the table is searched to see if the address thereof is already recorded. Thus, each branch instruction can be detected more quickly for the next time and later times, and the corresponding branch target address can be obtained as well. Further, if the table stores history information regarding the past branching occurrence of the branch instruction, as well as the address where the branch instruction is stored and its corresponding branch target address in combination, branch prediction becomes possible at the same time.
The efficiency of the branch operation for the pipeline processor has been improved by branch prediction as described above. However, since it is only after the first execution of the branch instruction that the address indicative of the location of the branch instruction and its corresponding branch target address are registered in the table, branching cannot be predicted before the first execution of branching. Further, since such a table is limited in size and there may exist a plurality of branch instructions, some of them must be replaced according to a predetermined algorithm such as an LRU (Least Recently Used) algorithm. Accordingly, branch instructions that have been removed from the table as a result of replacement become unable to be predicted using branch prediction until the branching is executed again.
There is a greater problem of the accuracy of the branch prediction. Since the branch prediction simply predicts a branching before actual execution of the branch instruction, the prediction may prove wrong. Such accuracy of prediction may depend on the branch prediction algorithm as well as on the characteristics of the program to be executed. For programs which include a conditional branch instruction such that the condition changes in every execution, it is very difficult to improve the accuracy.
When a prediction has proved wrong, all of the instructions fetched according to the prediction must be cleared, and the instruction fetch must be restarted from the branch point. Consequently, branch prediction sometimes takes longer than would be expected without such prediction.