In recent years, a processing apparatus, which comprises an instruction processor having n pipeline processing units (n is an integer equal to or larger than 2) for processing instructions has been disclosed. This processing apparatus is called a parallel processing apparatus since it fetches n instructions at a time from an instruction stream of one load module, and parallelly executes-these instructions in the pipeline processing units in the instruction processor. In a parallel processing apparatus of this type, to continuously process n instructions at a time is important to improve its performance. Therefore, when a program is (locally) constituted by instructions independent from each other, the highest processing performance can be obtained.
The most serious problem may arise in parallel execution of instructions in the above-mentioned parallel processing apparatus, if these instructions include branch instructions. For example, assume a case wherein four instructions, i.e., ADD (addition), BR1 (conditional branch), SUB (subtraction), and BR2 (conditional branch) are parallelly executed. It is also assumed that these four instructions are arranged in the order of ADD, BR1, SUB, and BR2 in the program, as shown in FIG. 12. With this arrangement of the instructions, a sequential processing apparatus executes these instructions in the order of ADD .fwdarw.BR1 .fwdarw.SUB .fwdarw.BR2. This sequential execution order is handled as priority of the instructions in the parallel processing apparatus. In this description, assume that the BR1 instruction uses a result Of the ADD instruction as a branch condition, and the BR2 instruction uses a result of a CMP (comparison) instruction executed prior to the ADD instruction as a branch condition.
Since the above-mentioned four instructions include the two branch instructions, i.e., BR1 and BR2, four instructions to be executed next (next instruction string) depend on the results of BR1 and BR2. As possible next instruction strings, the following three cases are available.
(1) When branch condition of BR1 does not hold true, and branch condition of BR2 holds true:
In this case, the next instruction string includes four instructions starting at an MP (multiplication) instruction at a branch target address TAR1 designated by the BR2 instruction, as shown in FIG. 12.
(2) When branch condition of BR1-holds true:
In this case, the next instruction string includes four instructions starting at a DV (division) instruction at a branch target address TAR2 designated by the BR1 instruction, as shown in FIG. 12. Therefore, execution of the SUB and BR2 instructions after the BR1 instruction must be canceled.
(3) When branch conditions of neither BR1 nor BR2 hold true:
In this case, the next instruction string includes four instructions following the BR2 instruction.
As exemplified above, in the parallel processing apparatus, there are various cases of next instruction string fetch processing depending on combinations of true/not-true results of branch instructions when a single execution step includes a plurality of branch instructions. The next instruction string can only be determined after branch judgment processing of all the plurality of branch instructions is completed.
For this reason, the conventional parallel processing apparatus waits for completion of branch judgment processing of all the branch instructions to be parallelly executed and, thereafter, checks the branch judgment results and the priority levels of the branch instructions so as to determine the next instruction string, and then fetches the next instruction string. However, since it is troublesome to check the branch judgment results and priority levels of all the branch instructions to be parallel executed, and the next instruction string cannot be fetched unless the checking operation is performed, it is difficult to realize high-speed processing.
In order to allow-the parallel processing apparatus to easily fetch the next instruction string, it is also known to employ one of the following two schemes.
The first scheme is to statically constitute a program itself by a compiler, so as not to include a plurality of branch instructions to be executed in a single step. With this first scheme, for example, when four instructions are ADD, BR1, SUB, and BR2, as described above, an NOP (no operation) instruction is set in place of the second instruction BR2 (i.e., having a lower priority) of the two branch instructions. BR2 is set at the beginning of the next four instructions.
According to the first scheme, since the number of branch instructions to be executed in a single step is limited to one, the next instruction string to be fetched when a branch condition holds true can be easily determined, and can be quickly fetched. However, the first scheme disturbs high-speed processing since an extra NOP instruction is inserted, and the program becomes redundant.
The second scheme is to sequentially execute branch instructions in a step if there are a plurality of branch instructions to be executed in a single step. According to the second scheme, since the number of branch instructions to be simultaneously executed in a single step is limited to one, the next instruction string can be easily determined like in the first scheme. However, the second scheme cannot execute true parallel processing, and disturbs high-speed processing since the branch instructions are sequentially executed in a single step. Note that in the second scheme, when a branch condition of a branch instruction, executed first in a single step, holds true, a branch instruction having a lower priority is not executed, as a matter of course.
As described above, in the conventional parallel processing apparatus, which permits parallel execution of a plurality of branch instructions, next instruction string fetch processing is complex and is delayed, thus disturbing high-speed processing. In another conventional processing apparatus, which statically constitutes a program itself so as not to include a plurality of branch instructions in a single execution step, or sequentially executes a plurality of branch instructions in a single step, although next instruction fetch processing can be simplified, an NOP (no operation) instruction must be undesirably inserted, or parallel processing performance is impaired due to sequential execution of instructions, thus also disturbing high-speed processing.