1. Field of the Invention
The present invention relates to an RISC (Reduced Instruction Set Computer) microprocessor and, more particularly, to a parallel processor system for parallelly processing a plurality of instructions using a superscalar method.
2. Description of the Related Art
In a conventional data processor, a SISD (Single Instruction Single Data) method for sequentially processing single instructions is usually used. For improving the performance of the processor, the following countermeasures are provided. The width of processable data is increased, and the operating frequency is increased. In addition, a pipeline method for simultaneously processing a plurality of data by dividing the processing itself into several sections is used, or a hardware used for special processing such as floating-point arithmetic is added.
FIG. 1 is a conventional pipeline processor having only one operating unit. Reference numeral 51 denotes a register file. Reference symbol RP denotes a read port of the register file 51; and WP, a write port of the register file 51. Reference numeral 52 denotes an arithmetic and logic unit (to be referred to as an ALU hereinafter); 531 and 532, two-input selector circuits; 54a to 54d, flip-flop circuits; 55a to 55c and 56a to 56c, tri-state buffer circuits; and 17, an instruction decoder.
In this pipeline processor, when instructions 1 to 4 shown in FIG. 2 are to be executed, as shown in FIG. 3, instruction decode D, instruction execute E, memory access M, and register write W of the instruction 1 are sequentially executed in four steps I to IV, the instruction 2 is executed in steps II to V, the instruction 3 is executed in steps III to VI, and the instruction 4 is executed in steps IV to VII. Therefore, a total of 7 cycles are required for writing the operation result in the register.
In order to further improve the performance of a processor, an MIMD (Multiple-instruction structure Multiple-data stream) method for simultaneously (parallelly) executing a plurality of instructions can effectively be used. In this method, a plurality of operation processing units are arranged, and these units are simultaneously operated. For example, there are an array processor having an array of identical operating units and a superscalar parallel processor having a plurality of different operating units and having a plurality of pipelines.
Since it is difficult to apply the former to general data processing, the application field of the processor is limited. In contrast to this, since the control method of the superscalar processor corresponds to the extension of the control method of a conventional processor, the superscalar processor can be relatively easily applied to general data processing.
In the superscalar parallel processor, a plurality of instructions are parallelly executed for 1 clock (cycle) by simultaneously operating a plurality of operating units. In this case, in processing the instruction, the plurality of instructions are simultaneously fetched/decoded by the operating units. For this reason, the superscalar parallel processor has a processing capacity larger than that of a conventional processor.
FIG. 4 shows a conventional superscalar parallel processor having two operating units and two pipelines arranged to parallelly execute two instructions.
In FIG. 4, reference numeral 71 denotes a register file, and reference symbols RP and WP denote a read port and a write port of the register file 71, respectively. Reference numerals 721 and 722 denote ALUs; 731a, 731b, 732a, and 732b, two-input selector circuits; 741a to 741d and 742a to 742d, flip-flop circuits; 751a to 751c, 761a to 761c, 752a to 752c, and 762a to 762c, tri-state buffer circuits; and 771 and 772, instruction decoders.
When this parallel processor is to execute instructions 1 to 4 (shown in FIG. 2), as shown in FIG. 5, instruction decode D, instruction execute E, memory access M, and register write W of the instructions 1 and 2 are sequentially executed in four steps I to IV, and instruction decode D, instruction execute E, memory access M, and register write W of the instructions 3 and 4 are sequentially executed in four steps V to VIII.
At this time, until the operation result of the instructions 1 and 2 are written in the register, instructions 3 and 4 cannot be executed. Therefore, a time required for writing the operation result of the instructions 3 and 4 in the register is 8 cycles obtained by summing 4 cycles required for executing the instructions 1 and 2 and 4 cycles required for executing the instructions 3 and 4.
Therefore in the above conventional superscalar parallel processor, although hardware is increased, when the number of instructions to be executed is larger than the number of instructions which can be parallelly executed, until the operation result of an instruction is written in the register, another instruction cannot be executed. As a result, the execution time for the instructions may be increased.
In other words, in the conventional superscalar parallel processor, when instructions having a number larger than the number of instructions which can be parallelly executed are to be executed, the execution time for the instructions may be increased.