1. Technical Field to which the Invention belongs
The present invention relates to a data processing method and an apparatus thereof for performing an arithmetic operation by pipeline processing.
2. Prior Art
If one arithmetic device is arranged in an arithmetic unit capable of performing pipeline processing, no resource conflict is naturally caused when arithmetic results are written to a general purpose register for storing the arithmetic results. Further, when plural arithmetic devices are arranged in the arithmetic unit, such a resource conflict is not also caused when all arithmetic times (latency) of arithmetic instructions executed in the respective arithmetic devices are the same.
However, a latency such as division/extraction of the square root (SQRT), etc. is generally very large in comparison with latencies of other arithmetic operations. Accordingly, in the case of a construction in which a first instruction (hereafter, write this as MAC) and a second instruction (hereafter, write this as DIV) having many latencies in comparison with the first instruction are executed by separate arithmetic devices and these arithmetic results are written to one general purpose register, a writing conflict is caused since the arithmetic results are simultaneously written to this one general purpose register when two arithmetic operations are simultaneously terminated.
Methods of (1) the number of writing ports is set to be plural (in an arithmetic unit capable of simultaneously issuing plural instructions, etc.) (2) a subsequent instruction is stalled, are used to avoid the writing conflict.
In the following description, a conventional data processor will be explained by using a block diagram (FIG. 1) and a pipeline view (FIG. 2) showing an example in which a conflict is caused when two arithmetic results are written to a general purpose register. A method for avoiding this conflict will be also explained by using a pipeline view (FIG. 3) showing an example in which the conflict is avoided by using the above method of (2).
The conventional data processor shown in FIG. 1 has two kinds of arithmetic devices composed of an arithmetic device 101 for MAC and an arithmetic device 102 for DIV as an arithmetic unit for performing an arithmetic operation. This conventional data processor also has a general purpose register 103 of one kind for writing arithmetic results obtained by these arithmetic devices 101, 102. In this figure, a general purpose register 103 for writing and a general purpose register 103 for reading are separately written to easily see operations, but the actual general purpose register 103 is constructed by a general purpose register of one kind.
When an instruction is sent from an instruction sequence 104 (instruction contents will be described later) to a decoder 105, the decoder 105 designates an address for outputting arithmetic data to the arithmetic device 101 or 102 with respect to the general purpose register 103. Further, the decoder 105 judges which of the arithmetic device 101 for MAC and the arithmetic device 102 for DIV executes the arithmetic instruction. The decoder 105 then issues an arithmetic starting instruction to each of arithmetic executing stages (201, 202, 203 or 204, 205) of the arithmetic device executing the arithmetic operation through a latch circuit 106 every cycle.
When the arithmetic starting instruction is issued, arithmetic data are inputted from an output port 103A of the general purpose register 103 to the arithmetic device 101 or 102, and a predetermined arithmetic processing is performed every stage (201, 202, 203 or 204, 205). In the arithmetic device 101 for MAC, the arithmetic operation is terminated in a third cycle after the arithmetic data are inputted to this arithmetic device 101. Thereafter, arithmetic results are written to the general purpose register 103 in a fourth cycle via a selector 107 for writing the arithmetic results. In contrast to this, in the arithmetic device 102 for DIV, the arithmetic operation is terminated in a sixth cycle after the arithmetic data are inputted to this arithmetic device 102. In this arithmetic device 102, the arithmetic results are written to the general purpose register 103 in a seventh cycle via the selector 107 common to the arithmetic device 101 for MAC.
The instruction sequence is sequentially constructed by "MAC-a", "MAC-b", "DIV-a", "MAC-c", "MAC-d", "MAC-e", "MAC-f", "DIV-b", "MAC-g", "MAC-h", and "MAC-i". Here, "MAC-a", "MAC-b", - - - "MAC-i" show instructions of the same kind for executing the arithmetic operation in the arithmetic device 101 for MAC and writing the arithmetic results to the general purpose register 103. "DIV-a" and "DIV-b" show instructions of the same kind for executing the arithmetic operation in the arithmetic device 102 for DIV and writing the arithmetic results to the general purpose register 103.
A conflict at a writing time of the arithmetic results as a problem in the conventional data processor of FIG. 1 will be described by using the pipeline view of FIG. 2.
Respective executing stages of both the arithmetic devices 101 and 102 are mutually independent so that arithmetic instructions using pipeline processing can be basically executed continuously. Namely, the arithmetic device 101 has stage E1(201), stage E2(202) and stage E3(203) as executing stages, and the arithmetic device 102 has stage E1(204) and stage E2(205) as executing stages. However, as mentioned above, a time taken to perform the arithmetic operation in the arithmetic device 101 for MAC is three cycles and a time taken to perform the arithmetic operation in the arithmetic device 102 for DIV is six cycles. Accordingly, in FIG. 2, for example, MAC arithmetic instructions are sequentially issued and arithmetic processing is sequentially progressed in a period T11 from #4 to #7. However, the general purpose register 103 for writing the arithmetic results is common. Accordingly, as shown by 401! and 402! in FIG. 2, when arithmetic operations are simultaneously terminated in a cycle (#9 in the case of 401! previously located by one, arithmetic results begin to be simultaneously written (arrows 403, 404) in the next cycle (#10 in the case of 401! so that a conflict is caused.
Therefore, a conventional example using the above conflict avoiding method (2) not causing conflict as shown by 401! and 402! in FIG. 2 will next be explained by using FIG. 3.
Either one (here, instruction "MAC-e" (501) of the arithmetic device 101 for MAC) of timings of starting execution is shifted by one cycle (502 of #7 in FIG. 3) so as not to set the same timing at the writing stages of two arithmetic results. Thus, all the executing stages of the arithmetic device 101 for MAC become empty by one cycle (as shown by a dotted arrow 504) so that a blank of one cycle is formed at a write back stage (503) of #10.
Accordingly, the arithmetic results of "DIV-a" can be written back by using this blank write back stage (503) so that the conflict is avoided. The conflict of 402! of FIG. 2 can be avoided by similar countermeasures (505, 506 in FIG. 3).
However, there are the following problems in the conflict avoiding methods (1) and (2) of the above conventional data processor.
Namely, in the method (1) for setting the number of writing ports to be plural, the number of ports is simply increased and writing control, etc. become further complicated. In the method (2) for stalling a subsequent instruction, instructional execution is delayed by one cycle or more to write the arithmetic results so that an operating time of the arithmetic processing is late and performance of the entire data processor is deteriorated.