1. Field of the Invention
The present invention is related to an improved processor provided with a data value prediction circuit and a branch prediction circuit. In particular, the present invention is related to an improved processor provided with a data value prediction circuit and a branch prediction circuit which makes it possible to improve the efficiency of supplying instructions.
2. Prior Art
Along with the increasing level of superscalar parallelism and the increasing number of superpipelined stages, the disturbance of control flow due to branch instructions tends to increasingly affect the overall performance of a processor system. While the performance penalty due to branch instructions has been recognized and examined for years) after introducing the pipelined control into processors, the parallelism of execution of instructions has been attracted interest of engineers resulting in the importance of the handling of branch instructions. The branch prediction technique has been employed in order to alleviate the influence of branch instructions. Namely, the history of the branch instruction as taken or not taken is written into a table with reference to which is predicted the result of the branch instruction.
FIG. 1 is a schematic diagram showing an example of a processor provided with a branch prediction circuit. Instructions is read from the instruction cache 2 and stored in an instruction window 1 by means of the processor. Instructions latched by the instruction window 1 is ready to be dispatched to the functional units 5 when necessary operands becomes available and received by one of the functional units 5, which then execute the instruction as dispatched. The result of the execution is broadcasted in the instruction window 1 and, at the same time, stored in the register file 4 after completion of execution. Some instructions may be executed with operands as read from the data cache 3. The branch prediction circuit 6 conducts branch prediction in order to inform the instruction cache 2 of the address of the instruction to be fetched.
The two-level adaptive branch prediction circuit is a subject of great interest among a number of the branch prediction circuits because of the high accuracy of branch prediction that is expected. The two-level adaptive branch prediction circuit is composed of two tables. FIG. 2 is a schematic diagram showing an example of the two-level adaptive branch prediction circuits, i.e., PAs. One table is referred to as BHT (Branch History Table) 021 composed of a plurality of shift registers. The shift registers are referred to as Branch History Registers (BHR). Each BHR is provided with one of the branch instructions and stores the history of the branch instruction corresponding thereto. Namely, each BHT is indexed with the address of the corresponding branch instruction. When the direction of a branch, i.e., taken(1) vs. not-taken(0), is decided, the result is inputted to the BHR. At this time, the oldest result is shifted out. The second table is indexed with the addresses of the branch instructions and the patterns of the history of the respective branch instructions.
The second table is referred to as Pattern History Table (PHT) 022 comprising a number of 2-bit counters with reference to which the branch prediction is conducted. If the branch is taken the corresponding counter is incremented by one, while if the branch is not taken the corresponding counter is decremented by one. The counter is saturated at its maximum and minimum values. The branch is predicted with reference to the most significant bit of the corresponding counter. Namely, if the most significant bit is 1 the branch is predicted as taken while if the most significant bit is 0 the branch is predicted as not taken. FIG. 3 shows the state transition of the counter. For example, the BHT is indexed with the lower part of the address of a branch instruction to read the history of xe2x80x9c0110xe2x80x9d. The PHT is indexed with the history of xe2x80x9c0110xe2x80x9d and the lower part of the address of a branch instruction. The 2-bit counter as shown in broken lines is then pointed to. The direction of a branch (taken vs. not-taken) is predicted with reference to the value of the counter. Other types of the two-level adaptive branch prediction circuits have been described in several references, e.g., T-Y.Yeh, Y. N.Patt, xe2x80x9cAlternative Implementation of Two Level Adaptive Branch Predictionxe2x80x9d, 19th, International Symposium on Computer architecture (ISCA), 1992.
On the other hand, in the recent years, the data value prediction technique attracts interest of many researchers. Dependence disturbing processor performance includes the name dependence and the data dependence in addition to the control dependence due to the branch instruction. The name dependence is caused by resource shortage, i.e., the shortage of available registers, and can be eliminated using register renaming. However, the data dependence can not be removed by such techniques, as it is called true dependence. Hence, the data dependence is a serious obstacle limiting instruction level parallelism.
The data value prediction technique is proposed in order to remove the data dependence by speculative execution and improve the performance of the processor. Namely, the instruction having the data dependency upon a preceding instruction is executed speculatively by predicting a source operand as required. Instructions having a data dependency can therefore be executed in parallel which execution is inherently impossible.
FIG. 4 shows instructions showing an example of such a data dependency. Namely, the instruction I1 and the instruction I2 have a data dependency and therefore can not be executed in parallel which execution is inherently impossible. However, the instruction I1 and the instruction I2 can be executed by predicting the source operand xcex32 of the instruction I2. FIG. 5 is a schematic diagram showing an example of a processor provided with a data value prediction circuit. Instructions with source operands which have not been calculated yet are executed by the use of values of the source operands as predicted by the data value prediction circuit 7. FIG. 6 is a schematic diagram showing an example of the data value prediction circuit 7 as illustrated in FIG. 5. The data value prediction circuit 7 has been designed in a hardware structure similar to that of the instruction cache 2. The history of the execution results as calculated is stored in the data value prediction circuit 7. Each entry of the data value prediction circuit 7 is indexed with the address PC of the program counter. Namely, each entry of the data value consists of the latest result of the operation (pred_value), the stride of the result of the operation (stride) and the state of the entry indicative of whether or not the prediction is possible. The stride value is obtained as the difference between the latest two results of the execution of the same instruction while the state value is stored by encoding the history of the execution results and indicates whether or not the prediction is possible.
The state transition as required is realized by means of the 2-bit saturation type counter as illustrated in FIG. 3. If a value prediction succeeds, the counter is incremented while the counter is decremented if it fails. When the tag is matched, the pred_value and the stride value are obtained from the entry as pointed by the address PC. The operand value as predicted is therefore calculated as the sum of the pred_value and the stride value. The state value is obtained at the same time. If the state value is PREDICT or WEAKLY_PREDICT, the operand value as predicted is used for executing an instruction requiring the operand. The data value prediction is otherwise not conducted. Other types of the data value prediction circuit 7 have been described in several references, e.g., M. H. Lipasti, J. P. Shen, xe2x80x9cExceeding the Dataflow Limit via Value Predictionxe2x80x9d, 29th International Symposium on Microarchitecture (MICRO), 1996, Y. Sazeides, J. E. Smith, xe2x80x9cThe Predictability of Data Valuexe2x80x9d, 30th International Symposium on Microarchitecture (MICRO), 1997, K. Wang, M. Franklin, xe2x80x9cHighly Accurate Data Value Prediction using Hybrid Predictorsxe2x80x9d, 30th International Symposium on Microarchitecture (MICRO), 1997.
As explained in the above, it has been proposed to make use of the branch prediction technique or the data value prediction technique in order to improve the performance of processors. However, there are following shortcomings in the conventional technique.
The branch prediction technique has been examined for many years so that further improvement is substantially difficult. For example, a genetic algorithm has been proposed in order to improve the accuracy of branch prediction as illustrated in J. Emer, N. Gloy, xe2x80x9cA Language for Describing Predictors and its Application to Automatic Synthesisxe2x80x9d, 24th International Symposium on Computer Architecture (ISCA), 1997. This reflects the limits of improving the accuracy of branch prediction.
On the other hand, in the case of the data value prediction, it is difficult to accomplish reasonable improvement of performance as seen from the additional cost for necessary hardware modification for introducing the data value prediction since the granularity of the speculative execution is substantially small. For example, it has been reported that, in spite of the accuracy of value prediction of over 90%, the improvement of performance is only of the order of 0.3%, T. Sato, xe2x80x9cLoad Value Prediction using Two-Hop Reference Address Renamingxe2x80x9d, 4th International Conference on Computer Science and Informatics (ICandS), 1998.
Furthermore, while the branch prediction and the value prediction have been separately researched, it has been reported that there are problems when the two prediction techniques are used in combination. For example, it has been reported that the accuracy of branch prediction is deteriorated when the data dependency is resolved speculatively in T. Sato, xe2x80x9cSpeculative Resolution of Ambiguous Memory Aliasingxe2x80x9d, International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA), 1997.
The present invention has been made in order to solve the shortcomings as described above. It is an important object of the present invention to provide an improved processor provided with both a data value prediction circuit and a branch prediction circuit capable of predicting branch directions with a higher degree of accuracy of branch prediction.
It is another object of the present invention to provide an improved branch prediction circuit provided with both a data value prediction circuit and a branch prediction circuit capable of predicting branch directions with a higher degree of accuracy of branch prediction.
In brief, the above and other objects and advantages of the present invention are provided by a new and improved A processor comprising: at least one functional unit for executing instructions; a plurality of registers connected to said functional unit for temprarily storing data and the result of execution of an instruction; means connected to said functional unit for supplying instructions to said functional unit; a data value prediction circuit for receiving the results of execution of instructions and predicting operand values for use in future execution of instructions; and a branch prediction circuit for predicting the direction of a branch; said branch prediction circuit executes a branch instruction by the use of an operand value as predicted by said data value prediction circuit.
Also, in accordance with a preferred embodiment of the present invention, the processor further comprises a cache memory for storing instructions and data.
Furthermore, in accordance with a further preferred embodiment of the present invention, the processor further comprises an instruction buffer for storing instructions.
Furthermore, in accordance with a further preferred embodiment of the present invention, the result of the execution of the branch instruction is used for predicting the direction of the branch.
Furthermore, in accordance with a further preferred embodiment of the present invention, the processor further comprises a cache memory for storing instructions and data.
Furthermore, in accordance with a further preferred embodiment of the present invention, wherein the result of the execution of the branch instruction is used for evaluating a branch prediction.
In accordance with another aspect of the present invention, a processor comprising: a plurality of functional units for executing instructions, said functional units including a branch unit for executing branch instructions; means connected to said functional units for supplying instructions to said functional unit; a plurality of registers connected to said functional unit for temprarily storing data and the result of execution of an instruction; a data value prediction circuit for receiving the results of execution of instructions and predicting operand values for use in future execution of instructions; and a branch prediction circuit for predicting the direction of a branch; wherein said data value prediction circuit outputs in the same cycle a first operand value as predicted for use in executing first execution of a first branch instruction as read from an address and a second operand value as predicted for use in executing second execution subsequent to said first execution of the first branch instruction as read from the same address,
wherein said branch unit executes the first branch instruction by the use of said first operand value while said branch prediction circuit predicts the direction of the branch by executing the first branch instruction by the use of said second operand value.
In accordance with another aspect of the present invention, a branch prediction circuit for use in a processor executing instructions in accordance with an address latched by a program counter comprising: means for predicting, by the use of past operand data, a first operand value for use in executing first execution of a branch instruction as read from an address of a memory; means for predicting, by the use of said first operand value, a second operand value for use in executing second execution subsequent to said first execution of the branch instruction as read from the same address; means for executing the branch instruction by the use of said second operand value; means for storing the result of the execution by said executing means as a prediction value; and means connected to said program counter for outputting said prediction value when the address latched by said program counter matches the address of the branch instruction.