The present invention generally relates to an arithmetic system incorporating arithmetic means of a pipeline structure. More particularly, the invention concerns an arithmetic system which allows processings to be executed at an increased speed in a data processing system.
As an approach to realization of a high-speed data processing system, there has been known a system where a plurality of arithmetic and logical units (ALU) are used which are optimized for the operations to be performed in the data processing system. A typical example of such data processing system is illustrated in FIG. 1 of the accompanying drawings. In this figure, a reference numeral 1 denotes an instruction unit (also referred to simply as an I unit) for decoding instructions, 2 denotes a storage unit (referred to simply as an S unit) for storing instructions and data, and numeral 3 denotes an arithmetic unit (also referred to as an E unit) for executing arithmetic operations designated or commanded by the instructions. The arithmetic unit or E unit is composed of two sub-units, that is, a floating-point unit (also referred to as an F unit) 4 optimized for executing floating-point instructions at a high speed and a general arithmetic unit (also referred to as a G unit) 5 adapted for executing other arithmetic instructions than the floating-point instructions such as fixed-point instructions, decimal instructions and the like. In practice, the floating-point instructions can also be executed by the G unit 5. However, since the G unit 5 is not designed to perform high speed processing, the F unit 4 destined only for executing the floating-point instruction at a high speed is provided in addition to the G unit 5. When importance is put on economy rather than high-speed operation of the processing system, the F unit 4 can of course be spared.
Next, referring to FIGS. 2 and 3, the manner in which execution of the floating-point instructions is optimized through provision of the F unit 4 mentioned above will be elucidated by taking a floating-point addition/subtraction instruction as an example.
Referring to FIG. 2 which shows an arrangement of the general arithmetic or G unit 5, reference numerals 6, 7, 8 and 9 denote, respectively, work registers incorporated in the G unit 5 for holding data in the course of an arithmetic operation, and a numeral 11 denotes a byte adder adapted for performing addition and subtraction on a byte basis. A reference numeral 12 denotes a parallel adder designed for performing additions and/or subtractions for data which contains a plurality of bytes, e.g. 8 bytes. A numeral 13 denotes a shifter, and a numeral 10 denotes an input data selector circuit for selectively introducing data held by the work registers 6, 7, 8 and 9 to the arithmetic units 11, 12 and/or 13. An output data selector circuit 14 serves to select the results of arithmetic operations performed by the units 11, 12 and 13 for re-storing them in the work registers 6, 7, 8 and 9. It is assumed that one cycle is required for performing arithmetic operations on the data held by the work registers and for returning the results of the arithmetic operations to the work registers in this general arithmetic or G unit.
Now, consideration will be given to a floating-point addition or subtraction instruction. Execution of the instruction for a floating-point addition or subtraction includes the four operations mentioned below:
(i) Two operands are compared with each other with respect to exponents thereof.
(ii) Mantissa of the smaller operand as resulted determined from the comparison of exponents is shifted to the right for digit alignment with the larger operand (this shift is referred to as the pre-shift).
(iii) Addition or subtraction is effected for the mantissas aligned in digit.
(iv) The number of leading "0s (zeros)" of the resultant mantissa is checked to shift the mantissa to the left so that no invalid "0" is present at the leading portion of the mantissa, while the resultant exponent section is corrected in correspondence to the shifting as performed (this is referred to as the post-normalization).
When the operations mentioned above are performed by the G unit of the arrangement shown in FIG. 2, the general arithmetic or G unit is used four times. More specifically, when initiation of the operation is instructed through a line 10a with data being held by the work register 6 or 7, exponents of the operands are compared with each other in the first cycle by using the byte adder 11. In the second cycle, the mantissa of the smaller operand as found from the comparison of the exponents is pre-shifted by means of the shifter 13. In the third cycle, addition or subtraction of mantissas is performed by the parallel adder 12. In the fourth cycle which is for the post-normalization, correction of the exponent is carried out by the adder 11 while the shifting of the mantissa is effected by means of the shifter 13.
Next, it is assumed that the instruction for a floating-point addition or subtraction is to be executed by the optimized operation units constituting the F unit 4. FIG. 3 shows a configuration of the optimized floating-point adder/subtracter unit. Referring to FIG. 3, it is assumed that two floating-point operands are placed in the work registers 15 and 16, respectively. In the first cycle, exponents of the two operands are compared with each other through a comparator 17 with respect to magnitude. Then, the mantissa of the smaller operand determined from the comparison is shifted in respect to the digit by a pre-shifter 19 or 20 to effect the required digit alignment. On the other hand, the value of the larger mantissa is set in a register 18. In the second cycle, the mantissas aligned in digit are subjected to addition or subtraction in a parallel adder 23, the result of which is set in a register 24. Simultaneously, the content of the register 18 is transferred to a register 25. Further, a zero-digit detector 26 detects the number of zero digits located at the leading portion or significant positions of the resultant. In the third cycle, the post-normalization is effected. In other words, correction of the exponent is effected by an adder 27 in accordance with the digit number indicated by the zero-digit detector 26, while the digit shifting of the resultant mantissa is performed by a shifter 28.
As will now be understood, execution of the instruction for the floating-point addition or subtraction requires three cycles when it is executed by the floating-point adder/subtracter unit shown in FIG. 3 in contrast to the general arithmetic or G unit shown in FIG. 2 which requires four cycles for executing the same floating-point addition or subtraction instruction.
In this way, it is possible to enhance the processing capability of a processing system by incorporating arithmetic units optimized for the operations to be effected in the processing system. It should here be mentioned that the F unit 4 shown in FIG. 1 includes a high-speed multiplier/divider and others in addition to the floating-point adder/subtracter unit shown in FIG. 3.
Now, operation of the F unit shown in FIG. 1 will be considered in conjunction with the floating-point adder/subtracter unit shown in FIG. 3 by way of example. It will be recalled that the floating-point adder/subtracter unit is capable of executing the instruction for a floating-point addition or subtraction in three machine cycles, which are discriminated from one another by identifying symbols FE.sub.A, FE.sub.B, FE.sub.C, respectively. Then, these machine cycles FE.sub.A, FE.sub.B and FE.sub.C can be defined as follows:
Cycle FE.sub.A : Digit alignment of the operands of the floating-point addition/subtraction is carried out by using the byte adder or comparator 17 and the pre-shifters 19 and 20.
Cycle FE.sub.B : Addition/subtraction of mantissas is performed by using the parallel adder 23.
Cycle FE.sub.C : Post-normalization is performed by using the byte adder 27 and the shifter 28.
It is apparent that each of the arithmetic operation units shown in FIG. 3 is used only once in the three machine cycles for executing the instruction for floating-point addition/subtraction.
In view of this feature, it is possible to enhance the capability of processing a series of floating-point addition/subtraction instructions generated in succession by operating simultaneously the individual arithmetics of the F unit in parallel through a well known pipeline processing. This will be described below by referring to FIGS. 4 and 5 of the accompanying drawings.
FIG. 4 illustrates the manner in which a series of successive floating-point addition/subtraction instructions are executed by the F units without resorting to the pipeline processing. In this case, execution of the second instruction is permitted only after the first instruction has been completely executed. Thus, six machine cycles are taken for executing two successive floating-point addition/subtraction instructions.
FIG. 5 illustrates a manner in which successive floating-point addition/subtraction instructions are executed through a pipeline processing. In this case, execution for the second instruction is activated after one machine cycle in succession to the initiation of execution for the first instruction. After elapse of another machine cycle, a third instruction is activated. In this way, three successive floating-point addition/subtraction instructions can be executed in five machine cycles. This is because the processing of three instructions each of which requires three machine cycle can be simultaneously carried out in one machine cycle by virtue of the pipeline processing.
It will now be apparent that when the arithmetic operation units which are optimized for operations to be executed by an information or data processing system are incorporated in that system, the number of the cycles required for executing the operations can be decreased to provide one advantage, while the performance or processing capability of the system can be enhanced through the pipeline processing by using the same arithmetic units to provide another advantage.
The pipeline processing of the above mentioned type is adopted in the arithmetic unit designed for executing vector instructions in a special purpose computer system. Examples of the arithmetic unit of the pipeline structure for executing the vector instructions are disclosed in Charles M. Stephanson's article titled "CONTROL OF A VARIABLE CONFIGURATION PIPELINED ARITHMETIC UNIT" of "Proceedings ELEVENTH ANNUAL ALLERT ON CONFERENCE ON CIRCUIT AND SYSTEM THEORY" (Oct. 3-5, 1973), pp. 558-567 and W. J. Watoson and H. M. Carr's article titled "Operational experiences with the TI Advanced Sientific Computer" of "AFIPS CONFERENCE PROCEEDINGS" Vol. 43 (May 6-10, 1974), pp. 389-397. Since the vector instruction is so prepared as to process a number of vector elements by a single instruction, the arithmetic processing of the vector elements are performed through a pipeline processing for executing the single vector instruction.
In this connection, it is noted that in the general purpose data processing system hitherto known, the optimized arithmetic units are frequently made use of for the purpose of reducing the number of cycles for executing operations. However, the pipeline processing by using the same arithmetic unit is not actually practiced for the reason the protocol for processing instructions as imposed on the information processing system can no longer be abided by when the pipeline processing is to be effected. A concrete example of such protocol is one in which instructions stored in a storage must be read out in the order in which they are stored and executed in that order. The order or sequence of the instructions stored in the storage is managed with the aid of storage addresses. Usually, when an instruction is read out, a succeeding one is immediately written in. In order to change or alter the order of the instructions by a program, a branch instruction may be provided in a string of the instructions for designating the next instruction as the operand of the branch instruction.
The difficulty of abiding by the protocol prescribing the sequential control as imposed on the instruction processing in the execution through the pipeline processing by the arithmetic unit is ascribable to the fact that the cycles for execution of instructions prepared in the processing system are different from one to another instruction. More specifically, the processing performance or throughput can certainly be enhanced provided that the pipelined arithmetic operation can be carried out when the floating-point addition/subtraction instructions occur in succession, as will be seen in FIG. 5. However, all the instructions processed by the F unit are not necessarily completed in three machine cycles. For example, execution of a load instruction for placing data from the storage in a floating-point register provided in the F unit is completed in one cycle. On the other hand, an instruction for floating-point multiplication/division requires more execution cycles than the addition/subtraction instruction.
It is thus apparent that the difference in the number of cycles for executing instructions by the F unit in dependence on the types of the instructions provides contradiction to the basic rule or principle of the sequential processing of the instructions, when the pipelined arithmetic operations are effected.
FIG. 6 is a view for illustrating contraventions to the rules of sequential processing of instructions in the execution through pipeline processing. Referring to FIG. 6, consideration will be made on the assumption that execution of a first instruction for the floating-point addition/subtraction is followed by a second instruction for loading to a floating-point register. Through the arithmetic pipeline, the floating-point register loading instruction is executed in parallel with execution of the FE.sub.B cycle of the first instruction. This execution is completed in one machine cycle. In this connection, when an overflow exception interrupt request occurs at this time as the result of the execution of the floating-point addition/subtraction instruction, the interrupt has to be effected in precedence to the succeeding execution of the floating-point register loading instruction. This complies with the principle of the sequential instruction control. However, in the case of the example of FIG. 6, the overflow exception is detected in the FE.sub.C cycle of the first instruction, i.e. the floating-point addition/division instruction. At that time, however, execution of the second instruction, i.e. the floating-point register loading instruction has been completed, which means contravention or disobedience to the basic rule or principle of the sequential control. The same holds true even when the second instruction is one to be executed by the G unit in a single cycle in the case of the example illustrated in FIG. 6.