1. Field of the Invention
The present invention relates to a branch prediction method, an arithmetic and logic unit, and an information processing apparatus, and, in particular, to a branch prediction method, an arithmetic and logic unit, and an information processing apparatus for performing branch prediction at a time of occurrence of a branch instruction.
Recently, in most microprocessors, a pipeline method or a superscalar pipeline method is used to increase a processing speed. Further, in microprocessors, in a case where a branch instruction occurs for increasing a processing speed, processing is continued as a result of a branch direction being predicted according to a degree of branch, and, thus, a processing speed is prevented from decreasing.
However, when branch prediction is performed in a microprocessor employing a pipeline method, a branch prediction miss may occur. When a branch prediction miss occurs, a pipeline bubble not participating the current processing occurs in each slot of the pipeline, processing of instructions is delayed, and, thereby, processing performance is remarkably degraded.
In particular, due to a sharp improvement of operation frequency of a microprocessor, an entire performance of the microprocessor is remarkably affected by such a delay of processing of instructions due to pipeline bubbles.
For the purpose of eliminating such a problem, various branch prediction methods have been proposed directed to a high hit rate of branch prediction. For example, a two-stage branch prediction method, an agree predictor, and so forth are known.
In such methods, learning is performed by detecting hit/miss of branch prediction. At this time, the number of detection cycles are needed to be performed until the number of misses corresponding to the number of bits of a branch prediction history register are detected and a branch prediction table is updated to a correct direction.
Accordingly, when a process of instructions is changed, branch prediction is made according to a result of learning for a previous process at first. Therefore, many misses of branch prediction occur immediately after a change of process, and this results in degradation of processing performance.
Especially, in processors employing a superscalar pipeline method, because a plurality of instructions are executed simultaneously, the number of branch perdition misses increases, and serious degradation of processing performance results. Accordingly, a method for easily reflecting a result of profiling to branch prediction is desired.
2. Description of the Related Art
FIG. 1 shows a block diagram of one example of an information processing apparatus in the related art.
The information processing apparatus 1 includes an arithmetic and logic unit 2, a main memory 3, a hard disk drive 4, an inputting unit 5, a display unit 6 and a system bus 7.
The arithmetic and logic unit 2 performs operations of data according to instructions. The main memory is used as a working area by the arithmetic and logic unit 2, and, data on which operations are carried, instructions, operation results and so forth are stored therein temporarily.
The hard disk drive 4 stores programs and data executed and used by the arithmetic and logic unit 2.
The inputting unit 5 includes a keyboard, a mouse, and so forth, and is used by a user for causing the information processing apparatus 1 to execute programs, and input data to the information processing apparatus 1.
The display unit 6 displays calculation/operation results of the arithmetic and logic unit 2 and so forth.
The system bus 7 connects the arithmetic and logic unit 2, main memory 3, hard disk drive 4, inputting unit 5 and display unit 6.
The arithmetic and logic unit 102 will now be described in detail.
FIG. 2 shows a block diagram of the arithmetic and logic unit 2.
The arithmetic and logic unit 2 includes a bus-interface unit 111, a secondary cache 112, an instruction fetch unit 113, an instruction decoder 114, an integer arithmetic part 115, a floating point arithmetic part 116, a functional unit 117, an internal bus 118, a data cache 119, a reorder buffer 120, a branch prediction control part 121 and an execution control part 122.
The bus-interface unit 111 acts as an interface between the arithmetic and logic unit 102 and system bus 7. Instructions from the system bus 105 are provided to the secondary cache 112 via the bus-interface unit 111.
The secondary cache 112 temporarily stores instructions from the system bus 105. Instructions stored in the secondary cache 112 are provided to the instruction fetch unit 113 in sequence.
The instruction fetch unit 113 fetches instructions from the secondary cache 112.
The instruction decoder 114 decodes instructions from the instruction fetch unit 113. Instructions decoded by the instruction decoder 114 are then provided to the integer arithmetic part 115, floating point arithmetic part 116 and functional unit 117.
The integer arithmetic part 115 performs integer arithmetic according to instructions from the instruction decoder 114. The floating point arithmetic part 116 performs floating point arithmetic according to instructions from the instruction decoder 114. The functional unit 117 performs predetermined functions according to instructions from the instruction decoder 114.
Arithmetic results of the integer arithmetic part 115, floating point arithmetic part 116 and functional arithmetic part 117 are provided to the internal bus 118.
Arithmetic results provided to the internal bus 118 are provided to the data cache 119 and reorder buffer 120.
The data cache 119 stores arithmetic results of the integer arithmetic part 115, floating point arithmetic part 116 and functional arithmetic part 117. Data stored in the data cache is stored by the secondary cache 112 via the bus interface 111.
The reorder buffer 120 stores reordered instructions.
The branch prediction part 121 predicts a branch direction when a branch instruction is provided.
The execution control part 122 controls operations of the entirety.
Branch instructions will now be described.
FIG. 3 illustrates branch instruction operations.
In FIG. 3, ‘inst’ denotes an ordinary instruction, while ‘br’ denotes a branch instruction.
In FIG. 3, instructions are described as ‘insti, insti+1, . . . , brA, instj, instj+l, . . . , brB, instk, instk+l, . . , instl, instl+1, . . .’, in the stated order.
The branch instruction ‘brA’ is an instruction for returning to the instruction ‘insti’ at the address A when a branch is made as shown in the figure, but for executing the instructions starting from the instruction ‘brB’ when the branch is not made.
The branch instruction ‘brB’ is an instruction for jumping to the instruction ‘instl’ at the address B when a branch is made, but for executing the instructions ‘instk, instk+1, . . .’ when the branch is not made.
The branch prediction part 121 will now be described.
FIG. 4 shows a block diagram of one example of the branch prediction part 121 in the related art.
The branch prediction part 121 includes index generating circuits 131, 132, a branch history register 133, a branch prediction information storage part 134, a tag comparing circuit 135, a tag determining circuit 136, and a prediction information generating circuit 137.
A branch instruction, a program count value and branch history information are provided to the index generating circuit 131.
The branch history register 133 stores past branch results.
FIG. 5 shows an example of a data configuration of the branch history register 133.
The branch history register 133 includes an n-bit register.
The branch history register 133 stores a branch result n times ago at an n-th bit, a branch result n−1 times ago at an (n−v1)-th bit, . . . , a branch result twice ago at a second bit, and a branch result once ago (immediately before this time) at a first bit.
Each branch result is expressed by ‘0’ or ‘1’. A branch result ‘1’ is stored when the branch is actually made while a branch result ‘0’ is stored when the branch is not actually made.
The branch history information stored in the branch history register 133 is provided to the index generating circuit 131 when a branch instruction occurs.
The index generating circuit 131 combines the branch instruction, program count value and branch history information, and generates index information. The index information generated by the index generating circuit 131 is used as an entry of the branch prediction information storage part 134.
The branch prediction information storage part 134 stores tag information 138 and branch prediction information 139. Each tag information 138 and branch prediction information are stored in a respective pair.
For the branch prediction information storage part 134, tag information 138 is determined by index information generated by the index generating circuit 131, and branch prediction information 139 corresponding to the thus-determined tag information is read out therefrom.
The branch prediction information 139 is information indicating a branch prediction direction.
Each branch prediction information 139 includes 2-bit information such as ‘00’, ‘01’, ‘10’ or ‘11’.
The branch prediction information ‘11’ indicates ‘strongly taken’, the branch prediction information ‘10’ indicates ‘weakly taken’, the branch prediction information ‘01’ indicates ‘weakly not taken’, and the branch prediction information ‘00’ indicates ‘strongly not taken’.
‘Strongly taken’ indicates a state in which a probability that a branch is made is large (the probability that the branch is made is largest).
‘Weakly taken’ indicates a state in which a probability that a branch is made is small (the probability that the branch is made is smaller).
‘Weakly not taken’ indicates a state in which a probability that a branch is not made is small (the probability that the branch is made is further smaller).
‘Strongly not taken’ indicates a state in which a probability that a branch is not made is large (the probability that the branch is made is smallest).
The branch prediction information 139 corresponding to the tag information 138 searched for from the branch prediction information storage part 134 is outputted from the branch prediction information storage part 134.
The branch prediction information 139 read out from the branch prediction information storage part 134 is provided to the prediction information generating circuit 137. The prediction information generating circuit 137 performs branch prediction based on the provided branch prediction information 39.
The result of the branch prediction performed by the prediction information generating circuit 137 includes 2-bit information, similar to the branch precaution information 139, such as ‘00’, ‘01’, ‘10’ or ‘11’.
According to the above-mentioned branch prediction result, a branch prediction is made, and, instructions are executed in advance.
At this time, the branch prediction information 139 stored in the branch prediction information storage part 134 is updated according to a result of detection of hit/miss between the branch prediction and actual branch.
FIG. 6 illustrates a problem in the related art.
In FIG. 6, P1, P2 and P3 indicate programs of different processes, respectively.
When a switching is made from the program P1 to the program P2 at a time t0, because the processes of the programs P1 and P2 are different from one another, a branch prediction for the program P2 becomes inaccurate. A certain time T is needed until a branch prediction corresponding to the program P2 is made since the program P2 is started to be executed.
Accordingly, it is not possible to perform a precise branch prediction during the time T after the program is switched, and, thereby, it is not possible to execute the program efficiently.
FIG. 7 shows a block diagram of the branch prediction part 121 shown in FIG. 2 (in this case, it will be referred to as 121′ in order to distinguish it from the part 121 shown in FIG. 4).
The branch prediction part 121′ includes a branch history register 22, an index combining circuit 23, a branch prediction table part 24, a multiplexer 25 and a branch prediction control part 26.
The branch history register 22 holds a history of hit/miss of branch prediction in a time sequence manner.
FIG. 8 shows one example of a data configuration of the branch history register shown in FIG. 7.
The branch history register 22 is configured by a predetermined number of bit rows.
Each bit of a bit row corresponds to a past branch prediction. Each bit of the branch history register 22 stores a value corresponding to a respective branch prediction.
The value corresponding a branch prediction is ‘1’ when the branch is actually made, but ‘0’ when the branch is not actually made.
Bit rows held in the branch history register 22 are provided to the index combining circuit 23.
The index combining circuit 23 combines a program count value from the execution control part 122 (shown in FIG. 2) with a branch history value from the branch history register 22. The branch prediction table part 24 is referred to according to a combining result of the index combining circuit 23.
The branch prediction table part 24 includes n branch prediction tables 24-1 through 24-n. Any of the branch prediction tables 24-1 through 24-n is selected according to the program count value.
Branch prediction data is selected from the thus-selected branch prediction table 24-x according to the value of the branch history register 22.
FIG. 9 shows one example of a data configuration of the branch prediction table 24-x.
The branch prediction table 24-x has addresses each being a combination of a program count value ax from the execution control part 122 and a branch history value bm from the branch history register 22, and branch prediction data c is stored according to the addresses.
The branch prediction data c is data of 2 bits, and can indicate the following 4 branch prediction states:
‘00’ indicates an SNT (Strongly Not Taken) state;
‘01’ indicates a WNT (Weakly Not Taken) state;
‘10’ indicates a WT (Weakly Taken) state; and
‘11’ indicates a ST (Strongly Taken) state.
The SNT state is a state in which a probability of not branching is large, that is, when a branch instruction is executed, because it hardly results in a branch at the past, it is predicted that it will not result in a branch at the next time either;
the WNT state is a state in which the probability of not branching is small, that is, when a branch instruction is executed, it is predicted that it will result in a branch a little, but mostly will not result in the branch;
the WT state is a state in which a probability of branching is small, that is, when a branch instruction is executed, it is predicted that it will not result in a branch a little, but mostly will result in the branch; and
the ST state is a state in which the probability of branching is large, that is, it is predicted that it will result in a branch in almost all the cases.
From the branch prediction table part 24, the branch prediction data c is read according to the address provided by the index combining circuit 23. The branch prediction data c read out from the branch prediction table part 24 is provided to the execution control part 122. The execution control part 122 speculatively executes branch instructions according to the branch prediction data c.
FIG. 10 illustrates one example of state transition operations of branch prediction performed by the above-described arithmetic and logic unit in the related art. In the figure, N (Not Taken) indicates a condition in which the branch instruction does not result in a branch, while T (Taken) indicates a condition in which the branch instruction results in a branch.
In FIG. 10, when a branch instruction is executed in the SNT (00) state, a result thereof is reflected in the branch history value. When the branch prediction makes a hit (N), the state remains in the SNT (00). However, when the branch prediction makes a miss (T), the state changes into the WNT (01).
When a branch instruction is executed in the WNT (01) state, a result thereof is reflected in the branch history value. When the branch prediction makes a hit (N), the state changes into the SNT (00). However, when the branch prediction makes a miss (T), the state changes into the WT (10).
When a branch instruction is executed in the WT (10) state, a result thereof is reflected in the branch history value. When the branch prediction makes a hit (T), the state changes into the ST (11). However, when the branch prediction makes a miss (N), the state changes into the WNT (01).
When a branch instruction is executed in the ST (11) state, a result thereof is reflected in the branch history value. When the branch prediction makes a hit (T), the state remains in the ST (11). However, when the branch prediction makes a miss (N), the state changes into the WT (10).
The result of the state transition performed as described above is used for subsequent branch prediction.
Thus, the branch prediction data c according to the branch prediction state is output.
Further, by switching the branch prediction table among those 24-1 through 24-n according to the program count value, it is possible to perform branch prediction control according to the program position.
However, in such a branch prediction method in the related art, because branch prediction data is determined according to a branch history, it is not possible to perform precise branch predictions unless a sufficient number of branch prediction hit/miss results are stored as a branch history.