1. Field of the Invention
The present invention relates to a data processing apparatus and, more particularly to a data processing apparatus for performing a pipeline processing.
2. Description of the Related Art
Conventionally, in a data processing apparatus for performing a pipeline processing, one instruction can be executed in each clock cycle by allowing each of various instructions handled to use the resources of the respective stages only once in a predetermined order.
FIG. 5 is a block diagram showing the basic configuration of such a data processing apparatus. In this data processing apparatus, in a first (instruction fetch) stage (hereinafter called "IF stage"), an instruction is fetched from an instruction memory (InsnMem) 502 and transferred to an instruction register (IR) 503 in accordance with an address indicated by a fetch pointer (FP) 501. At the same time, the fetch pointer (FP) 501 is incremented by an incrementer (+1) 504 for fetching of the next instruction.
In a second (instruction decode) stage (hereinafter called "ID stage") , an output of the instruction register (IR) 503 is decoded by an instruction decoder (Dec) 505, which then outputs immediate data 506, register selection signals 507, an instruction type signal 508, and a write register selection signal 509. Then, data is read out from a register file (RF) 510 to an internal bus 511 and/or an internal bus 512 in accordance with the register selection signals 507. Part of the thus-read data which are suitable for the execution of an instruction are selected and transferred, together with the immediate data 506, to arithmetic logic unit (ALU) input registers (ALU source register(AS), ALU destination register(AD)) 513 and shifter input registers (Shift control register(SC), Shift data register(SD)) 514 via internal data buses 511 and 512. Further, the instruction type signal 508 and the write register selection signal 509 are transferred, as they are, to an EI register (EI) 515 and an ER register (ER) 516 to prepare for the next stage.
In a third (execution) stage (hereinafter called "EX stage"), a computation corresponding to each instruction is performed by an ALU 517 and a shifter (Shifter) 518. If the instruction is of a type which will not cause access to a data memory (DataMem) 525 of the next, fourth stage accounting for a most part of the process (described later), a determined result is transferred to a pipeline register (ED) 519. If the instruction is of a type which will cause access to the data memory (DataMem) 525, an address calculated by the ALU 517 is transferred to a memory address register (MA) 520. Further, in case of writing, write data that is output from the shifter (Shifter) 518 is transferred to a store data register (SD) 521. To prepare for the next stage, the contents of the EI register (EI) 515 and the ER register (ER) 516 are transferred to an MI register (MI) 522 and an MR register (MR) 523, respectively.
In the fourth (memory) stage (hereinafter called "MEM stage"), if the instruction is of a type which does not cause access to the data memory (DataMem) 525, the only processing performed is to transfer the content of the pipeline register (ED) 519 to a pipeline register (MD) 524. If the instruction is of a type which causes writing to the data memory, the content of the store data register (SD) 521 is written to the data memory (DataMem) 525 in accordance with the address of the memory address register (MA) 520. If the instruction is of a type which causes reading, data is read out in accordance with the address of the memory address register (MA) 520, and transferred to a load data register (LD) 526. To prepare for the next stage, the content of the MR register (MR) 523 is transferred to a WR register (WR) 527. It should be noted that a data memory access instruction decoder 530 has a function of discriminating, from other instructions, an instruction that causes access to the data memory (DataMem) 525.
In a fifth (write back) stage (hereinafter called "WB stage"), if the instruction did not cause access to the data memory (DataMem) 525, the content of the pipeline register (MD) 524 is written to a register of the register file (RF) 510 which is indicated by the WR register (WR) 527. If the instruction caused reading from the data memory (DataMem) 525, the content of the load data register (LD) 526 is transferred to the same register of the register file (RF) 510. Thus, the execution of the instruction is finished.
Incidently, register selection signal comparators 528 and 529 that are respectively provided in the MEM stage and the WB stage are provided to allow the ensuing instruction to use a result of a preceding instruction before the execution of the latter is finished. The register selection signal comparator 528 serves for the output of a short path control signal 531, and the register selection signal comparator 529 serves for the output of short path control signals 532 and 533. Each of the short path control signals 531 to 533 serves to directly supply a result of a preceding instruction to the ensuing instruction, in order to allow the ensuing instruction to use the result of the preceding instruction before the execution of the latter is finished. The short path control signals 531 to 533 enable the respective short paths 534 to 536, and prohibit reading from corresponding registers.
FIG. 6 is a timing chart showing how each part operates which relates to a first short path function of this data processing apparatus.
As for an instruction sequence of this function, IF, ID, EX, MEM and WB correspond to mov r1, r2, and IF, ID, EX, MEM and WB correspond to mov r2, r3, which is one cycle delayed from mov r1, r2. That is, the first instruction transfers the content of r1 of the register file (RF) 510 to r2, and the second instruction transfers the content of r2 to r3. In executing the second instruction, the ID stage for reading the content of r2 is executed in synchronism with the EX stage of the first instruction. Since at this time point a result of the first instruction has not yet been written to the register file (RF) 510, the reading from r2 produces incorrect data. However, at this time point, since register selection signals 507 and the content of the ER register (ER) 516 have the same content (i.e., r2), a short path control signal 531 is produced. As a result, reading of r2 from the register file (RF) 510 is prohibited and, instead, a result of the first instruction is transferred via the short path 534.
FIG. 7 is a timing chart showing how each part operates which relates to a second short path function of this data processing apparatus. As for an instruction sequence of this function, IF, ID, EX, MEM and WB correspond to mov r1, r2, and IF, ID, EX, MEM and WB correspond to nop, which is one cycle delayed from mov r1, r2. Further, IF, ID, EX, MEM and WB correspond to mov r2, r3, which is one cycle delayed from nop. In this function, the third instruction uses a result of the first instruction. Since the first instruction has reached the MEM stage, the result is transferred via the short path 535.
FIG. 8 is a timing chart showing how each part operates which relates to a third short path function of this data processing apparatus. As for an instruction sequence of this function, IF, ID, EX, MEM and WB correspond to Id 0, r2, and IF, ID, EX, MEM and WB correspond to nop, which is one cycle delayed from Id 0, r2. Further, IF, ID, EX, MEM and WB correspond to mov r2, r3, which is one cycle delayed from nop. In this function, that is, the first instruction of the second short path function shown in FIG. 7 is replaced by the instruction of reading data from the data memory 525. The first instruction is to read the content of address 0 of the data memory 525 and transfer it to r2. Since the result of the first instruction is an output of the data memory 525, it is transferred via the short path 536.
FIG. 9 is a block diagram showing the basic configuration of another data (information) processing apparatus disclosed in the Japanese Patent Laid-Open Hei. 2-232727.
In this data (information) processing apparatus, a register file 901 has two read ports 902 and 903 and two write ports 904 and 905. Thus, a single instruction allows the simultaneous execution of two operations: reading data from the read port 902 and writing it to the write port 904 via an ALU 906, and directly writing data from the read port 903 to the write port 905.
In the above-described data processing apparatus of FIGS. 5 to 8, the execution of one instruction is allowed in each clock cycle by allowing each of various instructions to use resources of the respective stages only once in a predetermined order. Therefore, data needs to be transferred to the final stage even after the determination of a result. This increases pipeline registers and short paths, resulting in increases in circuit scale, cost, and power consumption.
On the other hand, the data (information) processing apparatus of FIG. 9 has an advantage that the data processing can be sped up by virtue of the simultaneous execution of two write operations. However, since the execution mechanism is constructed for a single instruction, if this apparatus is pipelined, the circuit scale, cost, and power consumption are increased as in the case of the above apparatus.