1. Field of the Invention
The present invention is related to a processor and in particular to the instruction scheduling of a processor.
2. Description of the Related Art
One of the factors preventing designers of processors from improving performance is the interdependencies between instructions. Dependences disturbing processor performance include the control dependence, the name dependence, the data dependence. There are many studies relating to the control dependence to attempt to remove the dependency by the branch prediction technique and the speculative execution. The name dependence is caused by hardware resource shortage, i.e., the shortage of available registers. This dependence can be eliminated using register renaming (R. M. Keller, Look-Ahead Processors, ACM Computing Surveys, vol.7, no.4, pp.177-195, 1975). On the other hand, however, the data dependence can not be removed by such techniques, as it is called true dependence. Hence, the data dependence is a serious obstacle to improvement of the instruction level parallelism.
An example will be explained with the instruction sequence as illustrated in FIG. 1. In order to make it easy to understand the discussion, each operation of f1 to f4 has a single source operand. It is assumed that the execution latency is 1 except for the instruction I1. The instruction I1 is assumed to be a load instruction, which is executed to occur cache miss, resulting in an execution latency of 4. The instruction sequence as illustrated in FIG. 1 involves two data dependences. One data dependence occurs between the instruction I1 and the instruction I3 while the other occurs between the instruction I3 and the instruction I4. In the case that there is a data dependence, the subsequent instruction can not be executed before completion of the execution of the previous instruction. In this case, the instruction I3 can not be executed unless the execution of the instruction I1 is completed while the instruction I4 can not be executed unless the execution of the instruction I3 is completed. Accordingly, in the conventional technique, a subsequent instruction having no data dependence are executed in advance of a preceding instruction which is stalled because of a data dependence. This treatment is called the dynamic instruction scheduling which is implemented, for example, by means of reservation station (R. M. Tomasulo, An Effect Algorithm for Exploiting Multiple Arithmetic Units, IBM Journal, vol.11, pp.25-33, 1967).
FIG. 2 shows one entry in an instruction window designed with a register uptake unit (G. S. Sohi, Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers, IEEE Trans. on Computer, vol.39, no.3, pp.349-359, 1990). Each entry of the instruction window is composed of two source operand fields 100 and 110, a destination field 120, a dispatch field 130, a functional unit field 140, an execution bit 150 and a program counter field 180. If the source operand is not yet available, a read bit 101(11) of the source operand fields is reset in order to indicate that the source operand is not available. The tag designating the source operand is also set. When the source operand becomes available, the value of the source operand is transferred to the content field 103(113) of the source operand field followed by setting the ready bit 101(111). On the other hand, a destination register number is stored in the register field 121 of the destination field 120 and the result of execution of the instruction is stored in the content field 122. The dispatched bit 130 is provided in order to indicate whether or not the instruction has been dispatched to the functional unit as designated by the functional unit field 140. The executed bit 150 is set when the execution of the instruction is completed. If the executed bit 150 is set, it is possible to dispatch a succeeding instruction(s) having the data dependence upon this instruction. Finally, the program counter field 180 is used to recover the status of the processor after the prediction fails and to implement precise exception.
The registration of instructions into the register update unit is performed in the sequential order of the instructions as in the target program. When the execution of a preceding instruction is completed, the result of the execution 122 and the destination register number 121 are broadcasted. The destination register number 121 is monitored for a succeeding instruction in order to obtain the result of the execution as the source operand 103(113) if the source operand tag 102(112) matches with the destination register number 121. It is possible to execute the succeeding instruction when all the source operands are obtained. In this case, the succeeding instruction can be executed even if execution(s) between the preceding instruction and the succeeding instruction is(are) not yet completed.
FIG. 3 is a block diagram showing the prior art processor. As illustrated in the same figure, the prior art processor is composed of an instruction cache 200, an instruction decoder 260, a register file 210, an instruction window 220, functional units 240 to 243 and a data cache 250. The processor serves to decode an instruction as fetched from the instruction cache 200 by means of the instruction decoder 260 and to register the instruction to the instruction window 220. The source operand is read from the register file 210. In the case that the source operand can not be obtained from the register file 210, it is transferred from the functional units 240 to 243 after completion of the execution of the preceding instruction. The instruction of which the source operand(s) as required becomes available is dispatched to the functional units 240 to 243. The result of the execution is written into the register file 210 through the instruction window 220.
FIG. 4 is a schematic diagram showing an example of the instruction scheduling in the case that the instruction sequence as illustrated in FIG. 1 is executed. In order to make it easy to understand the discussion, the following condition is assumed. Both the total number of instructions as fetched and the total number of instructions as dispatched are 1. Description about committing instructions is dispensed with. It is assumed that the destination register number with the tag information is equal to the architecture register number of the processor. Also, it is assumed that the register r1 and the register r2 have been available. The instruction scheduling will be explained on the above assumption. In the first cycle as illustrated in FIG. 4(A), the instruction I1 is issued. The source operand tag r1 and the destination register tag r11 are then saved in the corresponding fields. The source operand in the register r1 is available so that the ready bit (r) is set. Furthermore, the instruction I1 is dispatched so that the dispatched bit (d) is set. In the next cycle as illustrated in FIG. 4(B), the instruction I2 is fetched and dispatched. In the next cycle as illustrated in FIG. 4(C), the instruction I3 is issued. Since the executed bit of the instruction I1 which generates the source operand r11 is not yet set, it is impossible to dispatch the instruction I3. In the same cycle as illustrated in FIG. 4(C), the instruction I2 is executed in order to write back the result of the execution into the register r12 followed by setting the executed bit (e). Since the destination register tag r12 of the instruction I2 does not match the source operand tag r11 of the instruction I3, the instruction I3 is not dispatched at this time. The arrows as illustrated in FIG. 4(C) with dotted lines is used to indicate that the source operand tag of the instruction I3 does not match the destination register tag of the instruction I2. In the next cycle as illustrated in FIG. 4(D), the instruction I4 is issued. Since r13 is not yet available, it is not dispatched. In the next cycle as illustrated in FIG. 4(E), the operation of the instruction I1 is completed followed by writing the result of the execution back to r11 and setting the executed bit. Since the source operand r11 of the instruction I3 is available, the instruction I3 is dispatched. The arrows as illustrated in FIG. 4(E) with bold lines is used to indicate that the source operand tag of the instruction I3 match the destination register tag of the instruction I1. In the next cycle as illustrated in FIG. 4(F), since the operation of the instruction I3 is completed and the source operand r13 becomes available, the instruction I4 is dispatched. Finally, in the cycle as illustrated in FIG. 4(G), the execution of the instruction I4 is completed to finish the execution of the instruction sequence.
In order to resolve the problem of the data dependences, it is proposed to speculatively resolve the data dependences by the prediction of the results of execution in F. Gabbay: Speculative execution based on value prediction, Technical Report #1080, Department of Electrical Engineering, Technician, 1996. The speculative execution of instructions having data dependences requires recovery of the previous state of the processor before the speculative execution if the speculation fails. For this purpose, the instruction window is introduced as explained in the following.
The instruction window is designed to save the predicted data which has been used for the speculative execution. When the previous instruction generating the operand relating to the speculative execution of the instruction is completed, the result of the execution of the previous instruction and the predicted data are compared with each other. If the result of the execution of the previous instruction and the predicted data match each other, the speculative execution succeeds. If they do not match each other, the previous state of the processor before the speculative execution has to be restored. Also, the instructions having the dependency upon the load instruction of an incorrect predicted value should be executed again with a correct operand.
However, there occurs the following problem in the prior art reissue structure. All the instructions have to be saved in the instruction window before being committed so that the effective capacity of the instruction window is lowered resulting in lowering the flexibility of the instruction scheduling. In order to maintain the performance of the processor, the capacity of the instruction window may be increased. However, the machine cycle of the processor is closely related to the capacity and therefore the capacity cannot simply be increased. In order to increase the capacity of the instruction window, it is required to design the instruction window in a pipelined fashion and increase the latency thereof for the purpose of maintaining the machine cycle of the processor. The increase in the latency results in the performance penalty. In other words, the time lag between the issuance of an instruction and the output of the result of the execution of the instruction tends to be extended.
It is an object of the present invention to improve the efficiency of dynamically scheduling instructions in a processor capable of speculatively executing instructions having data dependences. In the case of prior art processors, when the speculation fails, the instruction of which speculative execution fails and the instruction(s) having the data dependency thereon are detected and issued again for the purpose of restoring the previous state of the processor before the speculative execution as failed. However, this prior art technique tends to result in lower efficiency of the instruction scheduling process. In accordance with the present invention, the reissue of instruction is possible without lowering the efficiency of the instruction scheduling process by dividing the function of scheduling instructions and the function of the reissue of instructions.