This invention relates to a data processor for improving processing speed by processing a plurality of instructions in parallel.
An example of a general purpose computer which improves processing speed by parallel processing of a plurality of instructions is the IBM360/91. This computer is described in detail in IBM Journal, Jan. 1967, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units". This computer employs a common data bus system (hereinafter referred to as the "CDB system") in order to eliminate a parallel processing inhibition factor resulting from the repeated use of the same register between adjacent instructions. In this system, an instruction waiting for the operation result of a preceding instruction as an input operand is reserved at a reservation station (hereinafter referred to as "RS": corresponding to an instruction queue). As soon as the operation result is obtained, it is taken into an input operand register disposed in RS through the common data bus CDB, and the operation is started from the instruction for which all the input operands are determined.
In accordance with this CDB system, two registers for first and second operands must be disposed in RS for all the instructions that are being processed in the system as the input operand registers. If the maximum number of instructions being processed in the system is N, 2N registers must be disposed; hence, the logic scale increases. Furthermore, when the operation results are simultaneously obtained by arithmetical or logical operational units (hereinafter referred to as "ALU"), the CDB becomes a neck when transferring the operation results to ALUs that need them.
U.S. patent application Ser. No. 682,839 assigned to the applicant of the present application solves this problem. In this prior patent application, a greater number of registers are disposed than the total number of registers that can be designated by instructions, and when an instruction B which is designated to be stored in the same register b as the register designated to store the operation result by a preceding instruction is executed, the operation result of an instruction A is stored in the register b and the operation result of the instruction B, in another register b'. When the operation result of the instruction B is to be read out by a succeeding instruction, the instruction is read out from the register b' so that the succeeding instruction B can be executed without waiting for the execution of the preceding instruction A. According to this prior patent application, the number of necessary registers for making the maximum number of instructions to be simultaneously processed in the system may be about N.
Furthermore, when the operation results are simultaneously obtained by a plurality of ALUs, the prior art application writes them into separate registers and each ALU can independently read out data of a plurality of registers. Therefore, the operation results can be transferred simultaneously to a plurality of ALUs that need them.
In this prior art application, however, an instruction which is after the instruction B and is designated so as to read out the result from the register b must detect that the result that should have been stored in the register b by the instruction B is not stored in the register b but actually in the register b' and must read out the result from the register b'. In other words, an amount of time for examining the correspondence of b to b' (or "conversion time") is necessary, and an instruction processing time increases as much.
Generally, an instruction reads a register as part of a decoding operation, calculates an address and reads out an operand data from a memory; hence, an amount of time for examining that the register b corresponds to b' ("conversion") is necessary. Therefore, the decoding time increases due to this conversion time and eventually, a machine cycle time of a data processing unit as a whole increases.