1. Field of the Invention
The present invention relates to a technology for branch predicting, using a branch history register storing a plurality of the latest branch result of a branch instruction and a branch history table storing prediction information indicating a probability of predicting that the branch instruction branches, for each index.
2. Description of the Related Art
A processor with a pipeline function is designed to predict whether a branch instruction branches by a branch prediction apparatus in order to maximize the function and to advance a process. If the prediction fails, the process must be advanced backward. This means that the proceeded process is wasted and a back process must be performed separately. Since there is such a penalty, its prediction accuracy greatly affects the performance of the processor. Therefore, the branch prediction apparatus strongly requires that the prediction can be performed with a higher accuracy.
As the conventional branch prediction apparatus, for example, a two-level type branch prediction apparatus is disclosed in patent reference 1. The type of branch prediction apparatus is focused since fairly high prediction accuracy can be expected of it.
FIG. 1A explains branch prediction performed by the conventional two-level type branch prediction apparatus.
A branch history register 702 stores a plurality of branch results of the latest branch instruction as history. The branch result of one branch instruction is expressed by one bit value. For example, not taken and taken are expressed by “0” and “1”, respectively. The branch result is stored in the branch register 702 while being shifted one bit by one bit. Thus, the branch register 702 stores the latest branch result for the number of the bits. For example, if the register 702 whose number of bits is four and whose current contents is “1001” and executes taken of a branch instruction, the contents is updated to “1001”.
An exclusive logical OR (XOR) calculation circuit 703 calculates the exclusive logical OR of the contents stored in a branch history table 704 and a branch instruction address value 701. The exclusive logical OR is used as the index of the branch history table 704.
The branch history table 704 stores a state, which is prediction information, indicating the probability of predicating that a branch instruction is taken for each index. The state (prediction information) is expressed by two bits. It indicates that the larger a value expressed by the two bits is, the higher the probability of taken is. Thus, if the value is “11” or “10”, it is predicted taken by a branch instruction succeeds. If the value is “01” or “10”, it is predicted that not taken.
FIG. 1B explains how to update a state in which the contents are stored in the branch history table 704.
The value as a state is updated according to an actual branch result. As shown in FIG. 1B, if taken, the value is updated by incrementing it with “11” as the upper limit. If not taken, the value is updated by decrementing it with “00” as the lower limit. Thus, the branch history table 704 is optimized according to a program to execute, thereby always maintaining the prediction accuracy high.
The performance of a processor is usually measured by conducting a benchmark test. For software for the test, a standard performance evaluation corporation (SPEC) benchmark, a transaction processing performance council (TPC)-C benchmark and the like are known.
The SPEC benchmark includes a SPECint for the performance evaluation of an integer operation, a SPECfp for the performance evaluation of a floating-point operation and the like as an index indicating the performance of a processor. However, the TPC-C benchmark is used to simulate a transaction processing system to evaluate the performance of a processor.
FIG. 2 shows the relationship between the data width of the branch history register 702 and the error rate of branch prediction for each benchmark. The error rate is an average.
As shown in FIG. 2, in the two-level type branch prediction apparatus, the larger the data width (figure) of the branch history register 702 is, the lower the error rate in the SPEC benchmark becomes and the higher the error rate of the TPC-C benchmark becomes. Thus, in a processor provided with the branch prediction apparatus, the larger the data width (figure) of the branch history register 702 is, the higher the performance in the SPEC benchmark becomes and the lower the error rate of the TPC-C benchmark becomes, which is a contradictory problem.
As to the feature of each benchmark, in the SPEC benchmark, although an instruction string strongly depending on the branch/non-branch history of the latest branch instruction exists, the number of branch instructions is not fairly small. However, in the TPC-C benchmark, prediction hardly depends on the branch/non-branch history of the latest branch instructions, the number of branch instructions is not fairly large in the TPC-C benchmark. Therefore, it can be considered to modify the data width of the branch history register 702 according to such features.
As described above, the index of the branch history table 704 is the XOR of the value of the branch history register 702 and the branch instruction address value 701 (G-share). Therefore, the larger the data width of the branch history register 702 is, the larger an area in the branch history table 704, occupied by the branch instruction prediction information of the branch instruction address value 701 becomes. In this case, the number of branch instructions that can be stored without being shared with another branch instruction decreases. Thus, if the data width of the branch history register 702 is increased, in the SPEC benchmark with a fairly small number of branch instructions, even when the number of branch instructions that can store prediction information without sharing it with another branch instruction is small, prediction information can become independently stored for each history pattern (combination of branch/non-branch) and the prediction accuracy is improved. However, in the TPC-C benchmark, although the number of branch instructions is fairly large, the number of branch instructions that can store prediction information independently decreases and the prediction accuracy decreases.
Patent reference 2 discloses a branch prediction apparatus capable of performing branch prediction by a plurality of methods and using one of them for actual branch prediction. Patent reference 3 discloses a processor for measuring performance in the case where a module performs various processes and modifying the configuration.
In any of the technologies disclosed in patent references 2 and 3, the branch prediction error rate is focused. However, the error rate sometimes increases because the prediction accuracy itself of the branch prediction apparatus is low. Therefore, there is a possibility that such technologies are not suited to correspond to such features (appearance frequency of a branch instruction) of the program (benchmark).
Patent reference 1: Japanese Patent Application Publication No. 2003-5956
Patent reference 2: Japanese Patent Application Publication No. H10-240526
Patent reference 3: Japanese Patent Application Publication No. 2002-163150