1. Field of the Invention
The present invention relates to a data processing device such as a microprocessor having pipelines for executing a plurality of sub-instructions independently and efficiently included in each single instruction even if a data hazard happens between the pipelines.
2. Description of the Prior Art
A single instruction including a plurality of sub-instructions to be executed by a plurality of execution units in a microprocessor as a data processing device is referred to as "Very Long Instruction Word" (VLIW) and the microprocessor based on such VLIW architecture is called the VLIW microprocessor.
The VLIW microprocessor executes a single instruction including sub-instructions to be executed by a plurality of execution units, for example, arithmetic and logic units (ALUs) as execution units and this VLIW microprocessor controls the operations of the plurality of execution units. When the instruction code is generated in the microprocessor, it is possible to set sub-instructions to be executed in each of the execution units into a single instruction certainly. This causes an increase in the utilization efficiency of each execution unit in the microprocessor. In addition to this effect, it is also possible to eliminate decode circuits each for specifying each execution unit to execute each sub-instruction from the microprocessor. As a result, the microprocessor has the advantage that the instruction decode operation can be executed at a high speed.
There is a multi-media (MMA) microprocessor that has been published in the Microprocessor Forum that was held on Oct. 22 and 23, 1996 in Japan, as one example of conventional VLIW microprocessors.
FIG. 1 is a schematic diagram showing a core-section of a conventional MMA microprocessor which is capable of executing VLIW architecture instructions. In FIG. 1, the reference number 900 designates the conventional microprocessor core, 901 denotes an execution unit for executing VLIW instructions of MMA, 902 indicates another execution unit for executing VLIW instructions of MMA, and 904 designates a data random access memory (a data RAM). Thus, the conventional MMA microprocessor is capable of executing two sub-instructions included in a single VLIW instruction by using the execution units 901 and 902 simultaneously.
The execution unit 901 comprises a multiplier 910, an accumulator 911, a shifter 912, and an arithmetic logic unit (ALU) 913. The execution unit 902 comprises an ALU 914 and a load store unit 915. The reference numbers 920 and 921 denote source data buses used for operation units such as multiplier 910, the accumulator 911, the shifter 912 and the ALU 913 included in the execution unit 901, through which data items to be used for the operation are read from the general purpose register file 903 and then the data items are transferred. The reference numbers 930 and 931 designate source buses used for the arithmetic logic unit (ALU) 914 and the load store unit 915 included in the execution unit 902, through which data items to be used for operation are read from the general purpose register file 903.
The reference number 925 designates a write-back bus through which the operation results in the execution unit 901 are written into the general purpose register file 903. The reference numbers 932 and 933 denote write-back buses through which the operation result in the execution unit 902 is written into the general purpose register file 903. The reference number 922 and 933 designate buses through which the operation result of the multiplier 910 is transferred to the accumulator 911 in order to accumulate it without penalty. The reference number 940 designates bi-directional buses, through which the load/store unit 915 and the data RAM 904 are connected to each other, used for controlling load/store operation of operand data items.
FIG. 2 is a block diagram showing a pipeline of a part of the circuit of the execution unit 901 incorporated in the MMA microprocessor shown in FIG. 1. In FIG. 2, the reference number 903 designates the general purpose register file, 913 denotes the ALU, and 910a and 910b indicate parts used for multiplication operation, namely, tree circuit of Wallace and CPA, respectively.
In the execution unit 901, it is possible to execute the multiplication operation within two stages of the pipeline. The reference number 921 designates a source data bus through which source data items are read from the general purpose register file 903. The reference number 925 denotes a write-back bus through which the operation result is written into the general purpose register file 903. The reference numbers 970, 971 and 972 denote tri-state buffers for driving data to the source data bus 921. The reference number 980 designates a bypath for outputting the operation result of the ALU 913 to the source data bus 921, and 981 indicates a bypath for outputting the multiplication result to the source data bus 921.
As shown in FIG. 2, the execution of one instruction requires six stages of a pipeline, a Fetch (F) stage, a Decode (D) stage, a data Read (R) stage, an Execution (E) stage, a Memory access (M) stage, a Write (W) back stage. Each of these six stages are executed in the pipeline. Thus, because one instruction is executed in the six stages of the pipeline data path registers (DR) 950, 951, 952, 953, 954, 955, and 956 shown in FIG. 2 are incorporated in the MMA microprocessor core 900.
The reference number 960 designates an instruction decoder in which control path registers used for control paths are incorporated in the MMA microprocessor core 600 in order to process control signals in the pipeline. The output signal from the control path register 962 is a write enable signal used for the data path register 950. The output signal from the control path register 963 is a write enable signal used for the tri-state buffers 971 and 972. The output signal from the control path register 964 is a write enable signal used for the tri-state buffer 970.
FIG. 3 is a timing chart of the pipeline of bypath processing in pipeline supported by the execution unit 901 shown in FIG. 2. As shown in FIG. 3, one instruction is executed in six stages of the pipeline. The reference numbers 1000, 1001, and 1002 designate three continuous instructions that are executed in the pipeline by the execution unit 901.
As has been explained in FIG. 2, the results of the ALU 913 and the multiplier 910 are transferred to the bypaths 980 and 901 at the M stage. The bypath data items obtained in the M stage of the pipeline are transferred to the source data bus 921 at the R stage of the pipeline. Thus, when both a destination designation field of an instruction code to be executed in the pipeline 1000 and a source designation field on an instruction code to be executed in the pipeline 1002 are same, the bypath processing from the M stage of the pipeline 1000 to the R stage of the pipeline 1002 is executed. The possible combinations of instructions executed by bypath processing are executed between ALU operation instructions, between multiplication instructions, and between a ALU operation instruction and a multiplication instruction.
FIG. 4 is a diagram showing a pipeline when a data hazard happens between the execution unit 901 and the other execution unit 902 shown in FIG. 1. In FIG. 4, the reference numbers 1010, 1011 and 1012 designate pipelines to be executed by the execution unit 901, 1020, 1021 and 1022 denotes pipelines to be executed by the other execution unit 902. When both a destination designation field of an instruction code to be executed in the pipeline 1010 and a source designation field on an instruction code to be executed in the pipeline 1022 are same, as shown by the shadowed portions in FIG. 4, a data item is read from the general purpose register file 903 in the R stage of the pipeline 1022 after the execution result of the pipeline 1010 is written into the general purpose register file 903 at the W stage. In this case, it must be required to halt the execution of the R stage in the pipeline 1022 until the execution of the W stage of the pipeline 1010 is completed. The pipeline 1012 is also executed in synchronization with the execution of the pipeline 1022.
As explained above, because the conventional architecture's VLIW microprocessor has the configuration described above, when data hazards occur between instructions to be executed in different pipelines, it is required to halt the execution of the pipelines for several clock time periods in order to keep or guarantee data values correctly. This causes a reduction in the instruction processing speed of the microprocessor.