1. Field of the Invention
The present invention relates to a data processor, and more particularly to a data processor with an improved data dependence detector.
2. Description of the Related Art
A non-program sequence execution or an out-of-order execution has widely been used for improving a high speed processing, wherein the instructions arc executed in a different sequence or order from a definitive sequence or order defined by a program. In accordance with the non-program sequence execution or the out-of-order execution, the processor executes an instruction which has become executable prior to an instruction which has not been non-executable yet, even if the non-executable instruction is prior in program sequence to the executable instruction, thereby improving the performance of the processor as compared to when the processor executes instructions in accordance with program sequences defined by the program or in-order execution.
The condition for allowing the non-program sequence execution is that no read after write dependence between instructions with reference to a registers The read after write dependence may also be referred to as a flow dependence.
If a post instruction, which is post in program sequence to a prior instruction, refers a register which is converted by the prior instruction, this means that a read after write dependence from the prior instruction to the post instruction is present. If the processor executes the post instruction and then the prior instruction in violation to the read after write dependence, then the meaning of the program is changed and it is no longer possible to obtain the correct execution result. Namely, if the read after write dependence is present to the register, then this means it impossible to execute the instructions in the non-program sequence.
If a prior instruction, which is prior in program sequence to a post instruction, refers a register which is converted by the post instruction, this means that a write after read dependence from the prior instruction to the post instruction is present. The write after read dependence may also referred to as an anti-dependence. If the processor executes the post instruction and then the prior instruction in violation to the write after read dependence, then the meaning of the program is changed and it is no longer possible to obtain the correct execution result. Namely, if the write after read dependence is present to the register, then this means it impossible lo execute the instructions in the non-program sequence.
If a post instruction, which is post in program sequence to a prior instruction, changes a register which is converted by the prior instruction, this means that a write after write dependence from the prior instruction to the post instruction is present. The write after write dependence is an output dependence. If the processor executes the post instruction and then the prior instruction in violation to the write after write dependence, then the meaning of the program is chanced and it is no longer possible to obtain the correct execution result. Namely, if the write after write dependence is present to the register, then this means it impossible to execute the instructions in the non-program sequence.
When the instruction is decoded, the register referred or changed by the instruction is confirmed, the instructions having the read after write dependence are likely to be executed by the program sequence.
It has been know to cancel the write after read dependence and the write after write dependence for allowing the non-program sequence execution.
Instructions which have accesses to a memory are dependent not only on the dependence on the register but also on the dependence on the memory.
Usually, the memory access instructions include a load instruction for reading out data from the memory and a store instruction for writing data into the memory.
If two load instructions or two store instructions have different addresses from each other, no dependence is present to the memory. This allows such the instructions to be executed by the non-program sequence.
If two load instructions or two store instructions have the same address as each other, a dependence is present to the memory. If, for example, a load instruction reads out data from an address, to which the data are stored by a store instruction which is prior in program sequence to the load instruction, then this means that a read after write dependence from the store instruction to the load instruction is present. In this case, if the instructions are executed in a reverse sequence to the program sequence, then the program meaning is changed and it is no longer possible to obtain the correct result of the execution of the program. Namely, if the read after write dependence is present to the memory, it is impossible to execute the instructions in the non-program order.
If, for example, a store instruction stores data to an address, from which the data have been read out by a load instruction which is prior in program sequence to the store instruction, then this means that a write after read dependence from the load instruction to the store instruction is present. In this case, if the instructions are executed in a reverse sequence to the program sequence, then the program meaning is changed and it is no longer possible to obtain the correct result of the execution of the program. Namely, if the write after read dependence is present to the memory, it is impossible to execute the instructions in the non-program order.
If, for example, a store instruction stores data to an address, to which the data have been stored by a store instruction which is prior in program sequence to the store instruction, then this means that an write after write dependence from the prior store instruction to the post store instruction is present. In this case, if the instructions arc executed in a reverse sequence to the program sequence, then the program meaning is changed and it is no longer possible to obtain the correct result of the execution of the program. Namely, if the write after write dependence is present to the memory, it is impossible to execute the instructions in the non-program order.
It has been know to cancel the write after read dependence and the write after write dependence by temporary storing data, which are to be stored by the store instruction, into a store buffer for allowing the non-program sequence execution.
If the read after write dependence is present, it is necessary to execute the instructions in the program sequences. It is, however, likely that an address of the load/store instruction has been unknown until the instruction is about to be executed. Namely, it is likely that the dependence has been unknown until the execution of the instruction. For this reason, a disadvantage in performance of the processor by the inhibition to the non-program sequence execution is large.
FIG. 1A is a diagram illustrative of a program sequence of store/load instructions. The program sequence is, that a load instruction “LD1” at an address “A1”, a store instruction “ST1” at an address “A4′”, a load instruction “LD2” at an address “A2”, a load instruction “LD3” at an address “A3”, and a load instruction “LD4” at an address “A4”.
Assuming that the address “A4′” of the store instruction “ST1” is equal to the address “A4” of the load instruction “LD4”, then the store instruction “ST1” and the load instruction “LD4” access to the same address “A4”, and the store instruction “ST1” is prior in program sequence to the load instruction “LD4”. A read after write dependence from the store instruction “ST1” to the load instruction “LD4” is present.
The program expects that the store instruction “ST1” stores data at the address “A4” and then the load instruction “LD4” reads this data out from the address “A4”, for which reason it is necessary that the store instruction “ST1” is executed prior to the execution of the load instruction “LD4” in accordance with the program sequence.
FIG. 1B is a diagram illustrative of executions of instructions in the program sequence of FIG. 1A. Cycle numbers, execution instructions, addresses of the execution instructions are shown. It is assumed that the address “A4′” of the store instruction “ST1” has not been known until the fifth cycle 5.
In the first cycle 1, the load instruction “LD1” at the address “A1” is executed. In the fifth cycle 5, the store instruction “ST1” at the address “A4′” is executed. In the sixth cycle 6, the load instruction “LD2” at the address “A2” is executed. In the seventh cycle 7, the load instruction “LD3” at the address “A3” is executed. In the eighth cycle 8, the load instruction “LD4” at the address “A4” is executed.
Even if the addresses “A2”, “A3”, and “A4” become known in the second, third and fourth cycles 2, 3 and 4 respectively, then it is inhibited that the load instructions “LD2”, “LD3”, and “LD4” arc executed in the second, third and fourth cycles 2, 3 and 4 respectively which are prior to the store instruction “ST1” in the fifth cycle 5 because the address “A4′” of the store instruction “ST1” has not been known until the fifth cycle 5, and thus the read after write dependence from the store instruction “ST1” to respective one of the load instructions “LD2”, “LD3” and “ID4” has not been known until the fifth cycle 5.
In accordance with the conventional program sequence execution, the load/store instructions are inhibited to be executed in the -second, third and fourth cycles 2, 3 and 4. The eight cycles are necessary to execute the five load/store instructions. The program sequence execution may drop the effective performance of the processor.
The speculative execution in accordance with the non-program sequence, assuming that the read after write dependence is not present, has been known as prior art. The speculative execution of instructions to the dependence between data will hereinafter be referred to as “data dependence speculative execution”.
In accordance with the data dependence speculative execution, it is possible in one case that the read after write dependence is actually not present and the speculative execution results in the success. It is also in another case that the read after write dependence is actually present and the speculative execution results in the failure. At the time when the read after write dependence becomes known, it is necessary to judge whether this case is either of the above two cases.
If the read after write dependence is actually not present and the speculative execution results in the success, the continuation to execute the subsequent instructions is allowed, whereby the effective performance of the processor is improved by the data dependence speculative execution in accordance with the non-program sequence.
If, however, the read after write dependence is actually present and the speculative execution results in the failure, then the program meaning is changed, thus it is no longer possible to ensure the correct result of the execution of the program. For this reason, the result obtained by the data dependence speculative execution in the non-program sequence is canceled, and in place the recovery process for the failure of the data dependence speculative execution is necessary. The recovery process for the failure of the data dependence speculative execution might be likely to drop the performance of the processor in comparison with the execution in the program sequence. If, however, a probability of success in the data dependence speculative execution is sufficiently higher than a probability of failure in the data dependence speculative execution, then the effective performance of the processor for processing the program may be improved in total.
The non-program sequence execution is disclosed by Mike Johnson in “Super-scalar processor” 1994. The recovery process for the failure in the data dependence speculative execution is disclosed in Japanese laid-open patent publication No. 5-224927.
FIG. 1C is a diagram illustrative of one example of the data dependence speculative execution which has resulted in the success. Cycle numbers, execution instructions, addresses of the execution instructions are shown. It is assumed that the address “A4′” of the store instruction “ST1” has not been known until the fifth cycle 5. It is also assumed that the address “A2” of the load instruction “LD2” has been known in the second cycle 2, the address “A3” of the load instruction “LD3” has been known in the third cycle 3, and the address “A4” of the load instruction “LD4” has been known in the sixth cycle 6.
In the first cycle 1, the load instruction “LD1” at the address “A1” is executed. In the second cycle 2, the load instruction “LD2” at the address “A2” is executed in non-program sequence because the address “A2” of the load instruction “LD2” has been known in the second cycle 2, whilst the address “A4′” of the store instruction “ST1” has not been known in the second cycle 2. In the second cycle 2, the read after write dependence from the store instruction “ST1” to the load instruction “LD2” has not been known. The load instruction “LD2” is executed speculatively to the store instruction “ST1”.
In the third cycle 3, the load instruction “LD3” at the address “A3” is executed in non-program sequence because the address “A3” of the load instruction “LD3” has been known in the third cycle 3, whilst the address “A4′” of the store instruction “ST1” has not been known in the third cycle 3. In the third cycle 3, the read after write dependence from the store instruction “ST1” to the load instruction “LD3” has not been known. The load instruction “LD3” is executed speculatively to the store instruction “ST1”.
In the forth cycle 4, the address. “A4′” of the store instruction “ST1” and the address “A4” of the load instruction “LD4” have not been known. Either the store instruction “ST1” and the load instruction “LD4” have been executed.
In the fifth cycle 5, the address “A4′” of the store instruction “ST1” has become known, and the store instruction “ST1” is executed. Concurrently, the read after write dependence from the store instruction “ST1” to respective one of the load instruction “LD2” and the load instruction “LD3” is judged. In this case, the address “A2” of the load instruction “LD2” and the address “A3” of the load instruction “LD3” are different from the address “A4′” of the store instruction “ST1”, then the read after write dependence is not present.
It is, therefore, judged that the data dependence speculative executions of the load instruction “LD2” and the load instruction “LD3” result in success. The subsequent instruction is continuously executed. In the sixth cycle 6, the load instruction “LD4” is executed in the program sequence with reference to the store instruction “ST1”, for which reason no program is raised even the read after write dependence is present from the store instruction “ST1” to the load instruction “LD4”.
The program sequence execution shown in FIG. 1B needs the eight cycles. By contrast, the succeeded data dependence speculative execution in the non-program sequence shown in FIG. 1C needs the six cycles. The data dependence speculative execution in the non-program sequence improves the performance by two cycle, provided that the data dependence speculative execution is succeeded.
FIG. 1D is a diagram illustrative of one example of the data dependence speculative execution which has resulted in the failure. Cycle numbers, execution instructions, addresses of the execution instructions are shown. It is assumed that the address “A4′” of the store instruction “ST1” has not been known until the fifth cycle 5. It is also assumed that the address “A2” of the load instruction “LD2” has been known in the second cycle 2, the address “A3” of the load instruction “LD3” has been known in the third cycle 3, and the address “A4” of the load instruction “LD4” has been known in the fourth cycle 4.
In the first cycle 1, the load instruction “LD1” at the address “A1” is executed. In the second cycle 2, the load instruction “LD2” at the address “A2” is executed in non-program sequence because the address “A2” of the load instruction “LD2” has been known in the second cycle 2, whilst the address “A4′” of the store instruction “ST1” has not been known in the second cycle 2. In the second cycle 2, the read after write dependence from the store instruction “ST1” to the load instruction “LD2” has not been known. The load instruction “LD2” is executed speculatively to the store instruction “ST1”.
In the third cycle 3, the load instruction “LD3” at the address “A3” is executed in non-program sequence because the address “A3” of the load instruction “LD3” has been known in the third cycle 3, whilst the address “A4′” of the store instruction “ST1” has not been known in the third cycle 3. In the third cycle 3, the read after write dependence from the store instruction “ST1” to the load instruction “LD3” has not been known. The load instruction “LD3” is executed speculatively to the store instruction “ST1”.
In the forth cycle 4, the address “A4′” of the store instruction “ST1” has not been known, whilst the address “A4” the load instruction “LD4” have become known. The load instruction “LD4” is executed.
In the fifth cycle 5, the address “A4′” of the store instruction “ST1” has become known, and the store instruction “ST1” is executed. Concurrently, the read after write dependence from the store instruction “ST1” to respective one of the load instruction “LD2”, the load instruction “LD3” and the load instruction “LD4” is judged. In this case, the address “A2” of the load instruction “LD2” and the address “A3” of the load instruction “LD3” are different from the address “A4′” of the store instruction “ST1”, the n the read after write dependence is not present.
Since, however, the address “A4” of the load instruction “LD4” is the same as the address “A4′” of the store instruction “ST1”, the read after write dependence from the store instruction “ST1” to the load instruction “LD4” is present. Even the read after write dependence from the store instruction “ST1” to the load instruction “LD4” is present, the non-program execution has been accomplished, for which reason the data dependence speculative execution of the load instruction “LD4” is judged to be the failure.
In order to ensure the correct result of the execution of the program, it is necessary to perform the recovery process for the failure of the data dependence speculative execution.
In the fifth cycle 5, the failure of the data dependence speculative execution is judged. The execution results of the load instruction “LD2” in the second cycle 2, the load instruction “LD3” in the third cycle 3, the load instruction “LD4” in the fourth cycle 4, the store instruction “ST1” in the fifth cycle 5 are canceled. Re-executions of the store instruction “ST1” in the seventh cycle 7, the load instruction “LD2” in the eighth cycle 8, the load instruction “LD3” in the ninth cycle 9, the load instruction “LD4” in the tenth cycle 10 are made as the recovery processes for the failure of the data dependence speculative execution.
The executions of the five instructions, for example, the store instruction “ST1”, the load instruction “LD2”, the load instruction “LD3” and the load instruction “LD4” need ten cycles. The program sequence execution shown in FIG. 1B needs the eight cycles. By contrast, the failure data dependence speculative execution in the non-program sequence shown in FIG. 1D needs the ten cycles. The data dependence speculative execution in the non-program sequence deteriorates the performance by two cycle, provided that the data dependence speculative execution is failure.
If, however, a probability of success in the data dependence speculative execution is sufficiently higher than a probability of failure in the data dependence speculative execution, then the effective performance of the processor for processing the program may be improved in total.
For allowing the processor to perform the data dependence speculative execution, it is necessary to judge the presence of the read after write dependence between the load/store instructions with reference to the memory. A data dependence detector has been known as detecting the presence of the read after write dependence between the load/store instructions. The conventional data dependence detector is disclosed by Manoj Franklin et al, entitled “ARB: A Hardware Mechanism For Dynamic Reordering Of Memory References”, IEEE Transactions On Computers, vol. 45, No. 5, May, 1996.
FIG. 2 is a diagram illustrative of a conventional data dependence detector. The conventional data dependence detector 100 includes address buffers 101, address comparators 102, and a logic-OR circuit 103. The address buffers 101 store plural load addresses of the load instructions. The address comparators 102 are connected to the address buffers 101 for comparing the plural load addresses of the load instructions stored in the address buffers 101 and a store address of the store instruction which have just been executed. The logic-OR circuit 103 takes a logical-OR of all of the compared results from the address comparators 102 and outputs a data dependence detected result.
The detection of the read after write dependence from the store instruction to the load instruction is realized by the following operations of the data dependence detector 100. If the load instruction is executed by the data dependence speculative execution, the address of the load instruction is stored into a free address buffer 101. Subsequently, a store instruction is executed. The address of the store instruction is inputted into all of the plural address comparators 102, so that the plural address comparators 102 compare the load addresses of the executed load addresses with the inputted store address of the store instruction just executed and outputs the compared results which arc transmitted to the logic-OR circuit 103.
The logic-OR circuit 103 takes the logical-OR of all of the compared results from the plural address comparators 102, and outputs the data dependence detected result. If the store address of the store instruction does not correspond to any of the load addresses of the load instructions stored in the address buffers 101, then it is judged that the read after write dependence from the store instruction to respective one of the load instructions is not present. The data dependence detected result indicates that the read after write dependence from the store instruction to respective one of the load instructions is not present. This means that the data dependence speculative execution has resulted in the success. Subsequent instructions will continuously be executed.
If the store address of the store instruction does correspond to any one of the load addresses of the load instructions stored in the address buffers 101, then it is judged that the read after write dependence from the store instruction to respective one of the load instructions is present. The data dependence detected result indicates that the read after write dependence from the store instruction to respective one of the load instructions is present. This means that the data dependence speculative execution has resulted in the failure. The recovery process for the failure of the data dependence speculative execution will subsequently be accomplished.
The above conventional data dependence detector 100 has the following two problems.
The first problem is that the necessary hardware size is large because for ensuring the exactly correct execution result of the program in the data dependence speculative execution, it is necessary to detect in full all of the read after write dependence.
The conventional data dependence detector 100 stores the load addresses of all the load instructions executed by the data dependence speculative execution into the address buffers 101 and then the address comparators 102 compare the load addresses with the store address of the store instruction. The load addresses of the load instructions executed by the data dependence speculative execution are stored into the address buffers 101. If no free space is present in the address buffers 101, then it is no longer possible to subject the load instructions to the data dependence speculative execution.
In this case, the subsequent load/store instructions arc executed in the program sequence. The number of the load instructions which may be executed by the data dependence speculative execution is limited by both the number of the address buffers 101 and the number of the address comparators 102. In order to improve the performance of the data dependence speculative execution, a large number of the address buffers 101 and a large number of the address comparators 102 are needed, whereby the necessary hardware size is large.
A second problem is that the speed of detecting the read after write dependence is slow. In order to detect the read after write dependence, it is necessary to take not only a time for processing the address comparison by the address comparator 102 but also a time for logic operation of the outputs from the address comparators 102. This makes it possible to improve the high frequency performance of the processor.
As the number of the address buffers 101 and the number of the address comparators 102 are increased, the number of the inputs into the logic-OR circuit 103 is also increased, whereby the above disadvantages become more remarkable.
Accordingly, the conventional data dependence detector needs a large hardware size for improving the performance of the data dependence speculative execution. The large hardware size increases the necessary time for processing the detection of the read after write dependence, thereby making it difficult to improve the high speed performance of the processor.
In the above circumstances, the development of a novel data dependence detector free from the above problems is desirable.