To improve calculation systems, instructions are executed in parallel. Several factors such as control dependency and data dependency impede parallel operations. Because of a control dependency, an instruction of a branch destination is not executed until an instruction to output a branch condition is completed and the branch condition is determined. Because of a data dependency, an instruction to consume data is not executed until other instructions to generate the data are completely executed. To sufficiently increase parallel operation, a method to speculatively release these dependencies is necessary.
As a method to execute instructions including dependencies with each other in parallel, a “predicated execution” and a “branch prediction method” are selectively used for controling the dependency, and “dependence collapsing” and “value prediction” are selectively used for the data dependency.
In “predicated execution”, a new source operand called “predicate” is assigned to an instruction, and the instruction is executed or not depending on the truth of the “predicate”. FIG. 1 shows one example of program code without a “predicate” instruction. In this example, if a value of register r3 is equal to a value of register r4, the execution branches label L2 by a condition branch instruction “beq”. If the values are not equal, the next instruction for the condition branch instruction “beq” is executed. FIG. 2 shows the same program code as in FIG. 1, but using a “predicate”. Refering to (1) “pseq” instruction (the predicate set instruction), if the value of register r3 is equal to the value of register r4, “1” is set in P1 as a predicate value and “0” is set in P2 as a predicate value. On the other hand, if the values are not equal, “0” is set in P1 and “1” is set in P2. Furthermore, as shown in (2)˜(4) of FIG. 2, as for the predicate instruction such as <P1> or <P2>, if “1” is set as a variable in “< >” by “pseq” instruction, the execution result of the instruction is reflected to a register. On the other hand, as for a non-predicate instruction as shown in (5) of FIG. 2, the execution result is always reflected in the register. Accordingly, in the program code of FIG. 2, if the value of register r3 is equal to the value of register r4, “1” is set as P1 and “0” is set as P2 by “pseq” instruction. In this case, the execution results of <P1> instruction “s11 r6, r10, 2”, “li r5, 1” and non-predicate instruction “move r2, r5” are only reflected in the register. In short, the same result as the case of branch to L2 in FIG. 1 is obtained. In the same way, if the value of register r3 is not equal to the value of register r4, “0” is set as P1 and “1” is set as P2. In this case, the execution results of the <P2> instruction “li r5, 0” and the non-predicate instruction “move r2, r5” are only reflected in the register. In short, the same result as the case of non-branch to L2 in FIG. 1 is obtained. As an embodiment of predicate execution, a first method is that an instruction is executed irrespective of the truth of the “predicate” and the execution result of the true instruction is reflected as the status in case of decision of truth. A second method is that the instruction is not executed until the truth of the “predicate” is determined and only the true instruction is executed. The first method can only execute in parallel instructions including a control dependency.
In the “branch prediction method”, a branch destination of the condition branch instruction is predicted before the condition is determined. As the prediction method, a first static method for indicating the branch destination (For example, if the condition branch instruction repeats a loop, the repeat is always predicted), a second dynamic method for predicating using an exclusive hardware (For example, a branch direction is recorded whenever the branch instruction is executed; when executing the next branch instruction, the branch direction of a previous branch instruction is used), and a combination of the first static method and the second dynamic method, are selectively used. Various means for realizing each method are proposed. An operation apparatus speculatively executes an instruction of prediction destination and decides whether the prediction is correct when a condition of predicted branch instruction is determined. If the prediction is correct, the speculative execution result is reflected in a state of the operation apparatus. If the prediction is erroneous, the speculative execution result is abandoned and an instruction for the correct branch destination is executed.
In “dependence collapsing”, instruction lines (a plurality of instructions) including the data dependency are converted into one instruction. This one instruction is executed by a special operation unit. Mainly, this method is used for multimedia operations or floating-point operations.
In “value prediction”, an execution result of an instruction is predicted before input data necessary for the instruction is determined. In this case, output data of other instructions depending on the instruction in the program is predicted, and the instruction is executed using the predicted output data. In this way, two instructions including the data dependency are executed in parallel. As a method to predict the output data, the previous output result of the instruction is recorded, and this output data is used as the next prediction value. Otherwise, an instruction whose output value changes by predetermined rule for example, the output value increases or decreases by predetermined ratio; several kinds of values are repeatedly output by predetermined order is found, and the output data is predicted by the predetermined rule. This “value prediction” is studied at present. Both an applicable ratio of prediction (a ratio of the number of instructions actually applied to the prediction and the number of dynamic instructions necessary for the prediction) and a hit ratio of prediction (a ratio of the number of instructions of correct prediction and the number of instructions applied to the prediction) are not so high. A general operation unit to use this method does not exist.
In the above-mentioned parallel execution method of an instruction including a dependency, as for “predicated execution”, though an instruction existing along one direction of a branch destination in the program is not necessary to be executed, two instructions existing along two directions of a branch destination are executed. In this case, the operation unit is detained by the unnecessary instruction. Accordingly, the execution ratio of effective instruction goes down, and the execution ratio of all instructions also goes down.
Furthermore, in “predicated execution”, a control dependency relation between the instructions is converted to a data dependency relation through “predicate”. The instruction sequence to set “predicate” (an instruction to set a condition of branch instruction before conversion) and the instruction sequence to input “predicate” (an instruction of branch destination before conversion) still depend on each other. Accordingly, these instructions are located in order of dependency in the program code. As a result, in order to execute an instruction located over a plurality of branch instructions in the program out of order, an apparatus to decide whether the instruction is executed out of order (For example, an instruction window for superscalar) must prepare a large number of entries.
In “branch prediction method”, if the branch destination of the branch instruction is erroneously predicted, all instructions to be executed after the branch instruction are unnecessary instructions in the worst case. Even if a hit ratio of prediction of each branch instruction is high, a probability that a plurality of branch predictions are continuouslly hit is low. Accordingly, the execution of the instruction located over a plurality of branch instructions in the program is useless in many cases. Furthermore, instructions, each including a control dependency, are not located in parallel in the program code. As a result, in order to execute the instruction located over a plurality of branch instructions in the program out of order, large scale hardware is necessary in the same way as “predicated execution”.
In “dependence collapsing”, the operation time of a complicated instruction such as a floating-point operation is reduced. However, the operation time of a simple instruction is not so reduced. In addition to this, a special operation unit to execute the converted instruction is necessary.
In “value prediction”, by predicting an output result of a particular instruction, the next instruction sequence located after the particular instruction and other instruction sequences including a data dependency to the particular instruction in the program are executed in parallel. However, in order to confirm whether the prediction is correct, these instruction sequences must be located in the original order of data dependency. Accordingly, in the same way as “predicated execution” and “branch prediction method”, in order to sufficiently execute the instructions out of order, the execution decision apparatus preparing many entries is necessary.