1. Field of the Invention
The present invention particularly relates to a branch prediction method and a processor for pre-decoding a branch instruction, for predicting the result of branch, and for reading an instruction in advance located at the destination of the branch, thereby lessening performance reduction due to a delay in instruction reading by the branch instruction when the prediction has turned out right at the time of actual execution of the branch instruction.
2. Description of the Related Art
Apparatuses incorporating processors have recently become widespread in various fields in accordance with the progress of computer technology. Such a processor may include a branch prediction device to reduce hazards due to branch instructions. A processor including a conventional branch prediction device pre-decodes an instruction before execution, and predicts the result of branch when the instruction is a branch instruction, and then reads the instruction located at the destination of the branch on the basis of the result of the execution of the branch instruction.
The method of branch prediction can be classified into static branch prediction and dynamic branch prediction. In the static branch prediction, branch taken or not taken is determined in advance by hardware. For example, it is possible to determine that all branches are not taken at all times. In this case, when the ratio of branch not taken is high, the effect of the prediction is raised. However, when the ratio of branch taken is high, the opposite effect occurs. As another example of the method, it is determined that branch is taken for backward branch, and branch is not taken for forward branch. This method is referred to as the BTFN (Backward branch Taken, Forward branch Not Taken) method. Since backward branch forms a loop, it is possible to assume that branch is taken in most cases. For this reason, it is possible to predict backward branch highly accurately. However, the prediction of forward branch may not be effective just as in the case of the example described above.
In order to solve these problems associated with the static branch prediction, dynamic branch prediction is used. In the dynamic branch prediction, the result of the execution of each branch instruction is stored as history information and used for the next branch prediction. FIG. 8 shows a conceptual view of the dynamic branch prediction. As shown in FIG. 8, in order to carry out the dynamic branch prediction, a table wherein one entry comprises the absolute address of a branch instruction and branch result information (xe2x80x9c1xe2x80x9d when branch is taken, xe2x80x9c0xe2x80x9d when branch is not taken, for example) is required as information for branch prediction. When a branch instruction is executed, the address of the instruction and the result of the execution are recorded in an entry. When the branch instruction located at the same address is pre-decoded next, the address is used to retrieve the corresponding entry, and branch prediction is carried out referring to the recorded branch result information. When branch taken is predicted by branch prediction, the instruction located at the destination of branch is read from the memory in which the instruction is stored. When branch not taken is predicted by branch prediction, the next instruction is read from the memory. Furthermore, if the result of branch coincides with the result of prediction when the same branch instruction is executed, the instruction read in advance from the memory by the prediction is executed. At this time, the delay required when the instruction located at the destination of branch is read from the memory can be lessened since the instruction has been read in advance. FIG. 8 conceptually shows addresses A to D, and the information on the result of the execution of the branch instructions corresponding to the addresses A to D.
As described above, the result of branch can be predicted for each branch instruction by the conventional dynamic branch prediction. Since the result obtained the last time (older information may be included) is used for prediction, the accuracy of branch prediction can be raised. In this case, a table for storing the addresses of branch instructions and result information, and a means for retrieving a necessary entry are required.
However, since an address is stored for each branch instruction in the conventional dynamic branch prediction, a large storage capacity is required for branch prediction. In addition since the corresponding branch instruction information is required to be retrieved quickly, an address comparison means is required for each entry. These are problems associated with the conventional branch prediction. Furthermore, it is usually required to deal with as many branch instructions as possible in order to raise the effect of branch prediction. To accomplish this, the number of the entries in the table is increased (usually, 512 to 1024). In this case, the above-mentioned problems become more serious.
An object of the present invention is to provide a branch prediction method which can reduce the storage capacity for storing branch prediction information and can simplify an information retrieval circuit while reduction in branch prediction accuracy is minimized.
Another object of the present invention is to provide a processor having a branch prediction device which can reduce the storage capacity for storing branch prediction information and can simplify an information retrieval circuit while reduction in branch prediction accuracy is minimized.
To attain the objects, the branch prediction method of the present invention comprises a first step wherein the result of branch by a branch instruction is recorded as history information in correspondence with the relative position of the instruction from an origin, and a second step wherein the result of branch by the next branch instruction is predicted by referring to the history information on the basis of the relative position of a pre-decoded instruction from the origin when the next branch instruction is pre-decoded before execution.
In this case, the first step comprises an origin position storing step for storing the position of the instruction located at the origin, an execution decoding step for decoding an instruction for execution, an instruction position obtaining step for obtaining the relative position of the instruction decoded for execution on the basis of the position of the instruction stored by the origin position storing step, and a history recording step for recording the result of branch by a branch instruction as history information in correspondence with the relative position of the branch instruction when the decoded instruction is a branch instruction.
Furthermore, the second step comprises a pre-decoding step for pre-decoding the instruction before the instruction is executed next time, a preceding instruction position obtaining step for obtaining the relative position of the pre-decoded instruction on the basis of the position of the instruction stored by the origin position storing step, a history reference step for referring to the history information corresponding to the relative position of the pre-decoded branch instruction when the pre-decoded instruction is a branch instruction, and a prediction step for predicting the result of the execution of the pre-decoded instruction by using the result of the reference to the history information.
Furthermore, the above-mentioned origin position storing step is a step for storing the address of the instruction to be executed, which is located at the origin, for example. More specifically, the origin position storing step comprises a loop detection step for detecting a loop structure in the program, and a step for fetching and storing the address of the instruction located at the head of the loop on the basis of the loop structure of the program detected by the loop detection step, for example. The above-mentioned loop detection step is a step for detecting the loop structure of the program by decoding a backward branch instruction, for example.
Instead of the above-mentioned configuration, the above-mentioned origin position storing step may comprise a block head address detection step for detecting the head address of a program block, and a block head address storing step for storing the address detected by the block head address detection step, for example.
The above-mentioned instruction position obtaining step comprises a step for obtaining the number of instructions by counting the number of the instructions decoded for execution, for example. In correspondence with this, the preceding instruction position obtaining step comprises a step for obtaining the number of instructions by counting the pre-decoded instructions.
Furthermore, instead of the above-mentioned configuration, the instruction position obtaining step may comprise a step for obtaining the number of instructions by counting the number of the instructions found to be branch instructions by decoding for execution, for example. In correspondence with this, the preceding instruction position obtaining step may comprise a step for obtaining the number of instructions by counting the instructions found to be branch instructions by pre-decoding.
Furthermore, instead of the above-mentioned configuration, the instruction position obtaining step may comprise a step for obtaining the difference between the address of the instruction decoded for execution and the address stored by the origin position storing step, for example. In correspondence with this, the preceding instruction position obtaining step may comprise a step for obtaining the difference between the address of the pre-decoded instruction and the address stored by the origin position storing step.
The above-mentioned history recording step may be a step for recording plural pieces of history information.
With the branch prediction method of the present invention, history information is recorded in correspondence with the relative position of a branch instruction from the origin on the basis of the result of branch, and at the execution time of the next branch instruction the history information is referred to on the basis of the relative position of the pre-decoded instruction from the origin to predict the result of branch by the branch instruction. Therefore, the result of branch by a plurality of branch instructions can be predicted by using only the origin information, the relative position information and the history information as the branch prediction information required for branch prediction. In comparison with the amount of information in the conventional method wherein absolute addresses and history information are stored for each branch instruction, the amount of information in the present embodiment wherein the origin information, the relative position information for each branch instruction and the history information are stored can be reduced, whereby the capacity of the memory for storing the branch prediction information can be made smaller than that required by the conventional method.
Furthermore, in the conventional configuration wherein the address of each branch instruction is stored, and at each issue of a branch instruction a branch instruction corresponding to the issued branch instruction is retrieved, a retrieval circuit having a plurality of comparators is required to compare the address of the branch instruction with all the stored addresses. However, in the case of the branch prediction method of the present invention, the amount of a relative position information (relative address values, for example) is less than the amount of address information. Consequently, the comparator and the retrieval circuit can be made smaller in size, and low power consumption can be attained.
Furthermore, the processor of the present invention comprises a first means wherein the result of branch by a branch instruction is recorded as history information in correspondence with the relative position of the instruction from an origin, and a second means wherein the result of branch by the next branch instruction is predicted by referring to the history information on the basis of the relative position of the pre-decoded instruction from the origin when the next branch instruction is pre-decoded before execution.
In this case, the first means comprises an origin position storing means for storing the position of the instruction located at the origin, an execution decoding means for decoding an instruction for execution, an instruction position obtaining means for obtaining the relative position of the instruction decoded for execution on the basis of the position of the instruction stored by the start point storing means, and a history recording means for recording the result of branch by a branch instruction as history information in correspondence with the relative position of the branch instruction when the decoded instruction is a branch instruction.
Furthermore, the second means comprises a pre-decoding means for pre-decoding the instruction before the instruction is executed next time, a preceding instruction position obtaining means for obtaining the relative position of the pre-decoded instruction on the basis of the position of the instruction stored by the origin position storing means, a history reference means for referring to the history information corresponding to the relative position of the pre-decoded branch instruction when the pre-decoded instruction is a branch instruction, and a branch prediction means for predicting the result of the execution of the pre-decoded instruction by using the result of the reference to the history information.
Furthermore, the above-mentioned origin position storing means is a means for storing the address of the instruction to be executed, which is located at the origin, for example. More specifically, the origin position storing means comprises a loop detection means for detecting a loop structure in the program, and a means for fetching and storing the address of the instruction located at the head of the loop on the basis of the loop structure of the program detected by the loop detection means, for example. The above-mentioned loop detection means is a means for detecting the loop structure of the program by decoding a backward branch instruction, for example.
Instead of the above-mentioned configuration, the origin position storing means may comprise a block head address detection means for detecting the head address of a program block, and a block head address storing means for storing the address detected by the block head address detection means, for example.
The above-mentioned instruction position obtaining means comprises a means for obtaining the number of instructions by counting the number of the instructions decoded for execution, for example. In correspondence with this, the preceding instruction position obtaining means comprises a means for obtaining the number of instructions by counting the pre-decoded instructions.
Furthermore, instead of the above-mentioned configuration, the instruction position obtaining means may comprise a means for obtaining the number of instructions by counting the number of the instructions found to be branch instructions by decoding for execution, for example. In correspondence with this, the preceding instruction position obtaining means may comprise a means for obtaining the number of instructions by counting the instructions found to be branch instructions by pre-decoding.
Furthermore, instead of the above-mentioned configuration, the instruction position obtaining means may comprise a means for obtaining the difference between the address of the instruction decoded for execution and the address stored by the origin position storing means, for example. In correspondence with this, the preceding instruction position obtaining means may comprise a means for obtaining the difference between the address of the pre-decoded instruction and the address stored by the origin position storing means.
The history recording means may be a means for recording plural pieces of history information, for example.
The processor of the present invention has means for recording history information in correspondence with the relative position of a branch instruction from the origin, for referring to the history information on the basis of the relative position of the pre-decoded instruction from the origin at the execution time of the next branch instruction, and for predicting the result of branch by the branch instruction. Therefore, the result of branch by a plurality of branch instructions can be predicted by using only the origin information, the relative position information and the history information as the branch prediction information required for branch prediction. In comparison with the amount of information in the conventional method wherein absolute addresses and history information are stored for each branch instruction, the amount of information in the present embodiment wherein the origin information, the relative position information for each branch instruction and the history information are stored can be reduced, whereby the capacity of the memory for storing the branch prediction information can be made smaller than that required by the conventional method.
Furthermore, in the conventional configuration having a means for storing the address of each branch instruction and for retrieving, at each issue of a branch instruction, a branch instruction corresponding to the issued branch instruction, a retrieval circuit having a plurality of comparators is required to compare the address of the branch instruction with all the stored addresses. However, in the case of the processor of the present invention, the amount of relative position information (relative address values, for example) is less than the amount of address information. Consequently, the comparator and the retrieval circuit can be made smaller in size, and low power consumption can be attained.