The present invention relates to a processing unit having a plurality of instruction fields in one instruction word register to execute in parallel these instructions.
A traditional processing unit has generally been structured to execute one processing with one instruction word and to execute in series processing one by one for the stream of instruction words.
A processing unit developed in recent years, on the other hand, has an instruction system which can process a plurality of instructions with only one instruction word and execute in parallel these instructions in order to improve the execution speed. This processing unit is generally called as a VLIW (Very Long Instruction Word) type processing unit.
The processing unit of this type comprises a plurality of processing units to execute in parallel a plurality of instructions. Moreover, this processing unit has a plurality of register files corresponding to a plurality of processing units to allow respective processing units to independently execute the processing. In the case of executing the particular processing using such a plurality of processing units, data communication between processing units is generally indispensable. For this purpose, therefore, the processing unit of this type has, for example, a means for transferring register value between a plurality of processing units or a means such as a common register which can be accessed from a plurality of processing unit. As this processing unit, for example, the technique is disclosed in the Japanese Patent Application Laid-Open No. 5-233281.
In addition to such means for realizing high speed execution explained above, there is provided a processing unit in which the processing itself is divided in time series into a plurality of stages and a plurality of independent stages execute the processing in series. These processing units are called pipeline type processing units.
It is known that the processing units of this type are capable of showing the maximum performance when the instruction words are arranged in series. Meanwhile, in the case of processing where the instruction words are not arranged in series and, for example, condition branching instructions are included, pipeline control is disturbed and tentative deterioration of performance is generated.
In view of overcoming such problems, the processing unit of this type has been modified to reduce the conditional branching processes. A typical method is use of a predicate register.
The predicate register is a register to modify the instruction words to determine whether the relevant instruction words are executed or not. Use of the predicate register enables remarkable reduction of the frequency in use of the condition branching instructions. For understanding of the present invention described later, this performance will be briefly explained with reference to the drawings.
FIG. 2 shows an example of the program using the C language. FIG. 3 shows an example where the program of FIG. 2 is compiled into the format to be applied to the processing unit of the related art. FIG. 4 shows an example where the program of FIG. 2 is compiled into the format to be applied to the processing unit using a predicate register. As shown in these figures, the arithmetic or logical processes realized by the condition branching in FIG. 3 can be realized in FIG. 4 without requiring the condition branching process. The second line in FIG. 4 describes an instruction word using the predicate register. With the comparison instruction of the first line, a value of the comparison result is written into the first predicate register (p0). The subtraction instruction of the second line is executed only when the value stored in p0 is "true" depending on the description "(p0)" preceding the instruction word. If a value stored in p0 is "false", although the subtraction instruction of the second line is read, then the subtraction process is not executed. With such executing method, the condition branching processes can be reduced.
However, when the means to realize such high speed processing explained above is used in combination, namely when the processing unit having the characteristics of the VLIW type processing unit and characteristic of the pipeline type processing unit is structured, there are following problems.
Since references are executed for processing results with each other between a plurality of processing units which are executing in parallel the processes, transfer processes of register file are frequently generated between processing units and the sufficient high speed operation effect owing to the parallel processes or pipeline processes cannot be obtained in some cases.
In addition, the high speed processes cannot be realized because the number of program steps for the transfer process is increased.