The present invention relates to a management method of an execution result of an instruction executed in the past in a processor having a system for predicting the execution result of the instruction that is being executed currently from the execution result of the instruction executed in the past.
In recent years, to enable high speed, processors have used pipelining that subdivides processing executed in a processor into the processing, such as instruction fetch, instruction decoding, read and execution of the operand, or write-back of an execution result, and executes it overlapping with instructions before and behind.
A branch prediction feature improves the performance of the processor using the pipelining by preventing a pipeline from being stalled, which will occur when a conditional branch instruction is executed. As shown in FIG. 7, if the branch prediction feature is not provided, whether a branch is taken or not is not determined until the execution of the conditional branch instruction is completed. Accordingly, the instruction fetch for the succeeding instruction of the conditional branch instruction is disabled and the pipeline must be stalled. On the contrary, when the branch prediction feature is provided, the instruction fetch is enabled without waiting for the completion of the branch instruction by previously predicting the execution result (branch taken or not) of the conditional branch instruction. If a prediction is correct, the pipeline can be prevented from being stalled when the branch instruction is executed.
Similarly, a value prediction feature prevents the pipeline from being stopped by predicting the execution result of the instruction that is being executed currently and improves the performance of the processor. If the value prediction feature is not provided, when an instruction is executed, the execution result of the previous instruction is needed. However, if the execution result of the previous instruction is not obtained at that point (for example, the execution result of a division instruction that requires many cycles is not obtained), the pipeline must be stalled until the execution result of the previous instruction is obtained. On the contrary, if the value prediction feature is provided, the pipeline can be prevented from being stalled by predicting the execution result of the previous instruction if the predicted execution result is correct.
One method of realizing a branch/value prediction is to store an execution result of an instruction in a history table when the instruction is executed, and make predictions from the execution result of the past instruction by referring to the history table when the execution result of the instruction is predicted. A program counter is used as a subscript of a prediction table.
When the branch/value prediction was done, an instruction execution unit of the processor obtains a correct execution result of the instruction under which the execution result was predicted. The correct execution result and a predicted value are compared. If the predicted value is correct, the execution result of the instruction executed using the predicted value is validated. If the predicted value is incorrect, the instruction executed using the predicted value is canceled and re-executed.
The lesser the cancellation of the instruction executed using the predicted value is, the higher the processor performance is obtained. Accordingly, a high percentage of correct predictions is obtained for the branch/value prediction.
In the branch/prediction device, one method to predict more acculate the high percentage of correct predictions is to use branch prediction algorithm having multiple different properties and a device that selects the output of the branch prediction algorithm, as described in xe2x80x9cCombining Branch Predictionsxe2x80x9d, by Scott McFarling, WRL Technical Notes TN-36, Digital Equipment Western Research Laboratory, June 1993.
The branch prediction feature differs in the usual behavior of a branch instruction depending on the branch prediction algorithm. Therefore, a higher percentage of correct predictions than in the case where only one branch prediction algorithm is used can be obtained by predicting that the prediction of which branch prediction algorithm is correct using the branch prediction feature having two branch prediction algorithms or more of two different properties and a table that stores the latest correct/incorrect predictions of each of the branch prediction algorithms.
However, this method has a disadvantage in which the number of circuits increases because two prediction algorithms or more and a circuit that selects the prediction algorithms must be mounted. In the branch prediction circuit, one method of obtaining the high percentage of correct predictions is to invalidate a branching entry of low probability as described in Japanese Unexamined Patent Publication No. Hei 08-110857. By invalidating the entry in history table, the invalidated branch instruction is judged not to branch. Therefore, the decrease of the percentage of correct predictions caused by invalidating the branching entry of the low probability is slight, whereas it is assumed that the probability of erasing the branching entry of high probability when the execution result of the branch instruction is newly registered in the history table is reduced and the percentage of correct predictions is increased.
However, because this method erases the entry in which a fixed standard is satisfied, the entry might have been invalidated excessively. Otherwise, the percentage of correct predictions cannot be increased in some case because there are too few entries to register a new entry into the history table.
In the branch/value prediction, one method of increasing the percentage of correct predictions is to increase an entry in which an execution result of an instruction can be stored. There are two entries or more in which the execution result of the instruction can be stored, and the configuration system that stores the execution result in any one of the entries is generally called set associative. The set associative is a system used frequently in a cache memory. In the set associate, usually, when the number of selectable entries is k, the total number of entries, or associativity is n, and the subscript of the table that stores an execution result is i, the entry to be stored is either (i mod n)xe2x88x92(i mod k) or (i mod n)xe2x88x92(i mod k)+(kxe2x88x921).
In the set associativexe2x80x94cache memory, the set associative assumed to have the best performance is called Least-Recently-Used (LRU) among the methods of selecting an entry to be stored from two storable entries or more. According to the LRU method, a new data is stored in the entry that is not used for the longest time in the selectable entries.
To realize this, the LRU uses an LRU table of n/k entries when the total number of entries is n and the number of selectable entries is k. The LRU table records the order when the entry was referred to in the relative priority between selectable entries. The relative priority between the selectable entries is called a priority here.
For example, when the total number of entries is 16 and the number of selectable entries is 4, the eighth to 15th entries are assumed to have been referred to in the order of 8xe2x86x929xe2x86x9210xe2x86x9211. At this point, the LRU table records that the entries were referred to in the order of 8xe2x86x929xe2x86x9210xe2x86x9211. Subsequently, if 9 is assumed to have been referred to, the LRU table records that the entries were referred to in the order 8xe2x86x9210xe2x86x9211xe2x86x929. Further, when there is an execution result to be newly stored and the execution result must be stored in any of the eighth to 11th entries, the execution result is stored in the lowest entry of the priority in the selectable entries, that is, the eighth entry that is not referred to for the longest time.
Because the problem of the LRU in the branch/value prediction is that even if the probability at which a prediction is unsuccessful is high, an entry of a high reference frequency becomes high in the priority and is left in the history table, the entry in which prediction was correct will be erased from the history table. To increase the percentage of correct predictions, the entry in which the prediction was done should exist in the prediction table rather than in the entry in which the probability at which the prediction is unsuccessful is high even if the reference frequency is low.
That is, in the LRU, because of an entry in which the reference frequency is high, but the probability at which the prediction is unsuccessful is also high, the processor performance decreases.
Accordingly, an object of the present invention is to increase the percentage of correct predictions of the branch/value prediction feature constructed using the set associative.
To attain the above object, the present invention uses a priority table that records a priority between selectable entries in the same manner as an LRU table. The LRU table is updated so that the priority of an entry that was referred to every reference, whereas the priority table is referred to for a prediction. Only when the prediction was correct, the priority table is updated so that the priority of the entry that was referred to can be increased.
Accordingly, in a branch/value prediction feature, priorities of an entry of a low reference frequency and an entry of a low percentage of correct predictions are lowered. When there is an execution result to be newly stored, it is stored in the entry of the lowest priority among the selectable entries.
At this point, because an entry of a small contribution to a performance gain is erased from a history table and an entry of a large contribution to the performance gain is relatively left in the history table, the performance of a processor can be improved.
Further, the present invention refers to the history table in the branch/value prediction feature and decreases the priority of the entry that was referred to when the prediction is unsuccessful. Accordingly, because the entry of the small contribution to the performance gain is erased from the history table and the entry of the large contribution to the performance gain is relatively left, the performance of the processor can be improved.
Further, the present invention is referred to in the branch/value prediction feature and increases the priority of the entry that was referred to when the prediction was correct, then decreases the priority of the entry that was referred to when the prediction is unsuccessful. Accordingly, because the entry of the small contribution to the performance gain is erased from the history table and the entry of the large contribution to the performance gain is relatively left, the performance of the processor can be improved.
Further, the present invention increases a priority in accordance with the number of cycles that decreases when the prediction was correct in comparison with when the prediction was not done in the branch/value prediction feature. Consequently, the entry of the large contribution to the performance gain becomes easy to leave and the performance of the processor can be improved.