1. Field of the Invention
The present invention relates to a pipeline information processing circuit for processing data based on a pipeline processing method, and particularly to an information processing circuit which can process a plurality of data collectively with an arithmetic unit.
2. Description of the Prior Art
Recently, processors based on the RISC (Reduced Instruction Set Computer) method have been widely spread. One of the reasons is that a pipeline processing method is used in the RISC system.
FIG. 1 is a block diagram of a conventional pipeline information processing circuit which is used in a typical RISC system.
As shown in the same drawing, this pipeline information processing circuit is operated with four pipeline stages comprising an instruction fetch stage (hereinafter, called F stage), a decode stage (D stage), a process execution stage (E stage) and a write-back stage to a register file 31 (W stage).
In such construction, a source data read out from the register file 31 in the D stage is latched by input registers 35a, 35b of an arithmetic unit 33 at the end of this stage. Then, the result of an arithmetic operation is outputted to an output register 37 at the end of the E stage, and the operation result is written back to the register file 31 at the end of the W stage.
Moreover, data buses 301, 303 are provided as data bypasses through which data from the arithmetic unit 33 and output register 37 are transmitted respectively. These bypasses are controlled by a bypass control section 39. FIG. 2 is a block diagram of the bypass control section 39, and FIG. 3 is a tiring chart for explaining a process in the bypass control section 39.
As shown in FIG. 3, in an execution flow 1, data corresponding to register numbers F0, F1 in the register file 31 are respectively read by the input registers 35a, 35b in the D stage. Then, the operation result obtained at the E stage is written back to a register number F2 in the register file 31 at the W stage.
While, at the D stage in another execution flow 2, a data of register number F2 is read in the input register 35a and a data of register number F3 in the register file 31 is read in the input register 35b. Then, the operation result obtained at the E stage is written back to a register number F4 in the register file 31.
In this case, when the data of register number F2 is read in the execution flow 2, one of the data of register number F2 in the E stage in the execution flow 1 is transmitted to the input register 35a through the data bus 301 as a bypass. At the time, in registers 41E, 41W are respectively held register numbers (hereinafter, called target register numbers) of the register file 31 to which the arithmetic operation results of the execution flow 1 are written back. Moreover, a target register number 305 of the E stage, which is held in the register 41E, is compared by a comparator 43 with a register number 307 of a data to be latched in the D stage in the execution flow 2.
Since, both of the target register number 305 and the register number 307 correspond to F2, a coincidence signal 309 is outputted to a priority judgement unit 45. Namely, the data bypass 301 of the E stage to the input register 35a is selected by a selector 47 in accordance with a result from the priority judgement unit 45.
Accordingly, when data whose register numbers coincide with each other are existent on the pipe lines, it is possible to start the process of execution flow 2 before the arithmetic operation result on the execution flow 1 is written back to the register file 31.
While, since the arithmetic operation unit 33, input registers 35a, 35b and output register 37 are arranged in such construction as shown in FIG. 4, each of the input registers 35a, 35b can process only one data in each process operation.
Namely, the arithmetic operation unit 33 which is used in such a conventional pipeline information processing circuit can process only one data in each process operation.
In other words, the conventional pipeline information processing circuit having such data bypass construction as mentioned above can transfer only one data in each data transfer process. Therefore, it has not been possible so far to process a plurality of data collectively at a time by an arithmetic operation unit. Accordingly, in such a case, a plurality of registers for accessing a plurality of data must be required when the data are supplied to an arithmetic operation unit from a register file or when the arithmetic operation results are stored in the register file. However, such construction inevitably degrades the pipeline process efficiency.
To solve this problem, there is a generally known method for high-speed transfer of a plurality of data, in which a register for holding the target register number has multi-bus port construction in which a plurality of data can be transferred in parallel through a plurality of data buses. However, in such a method, the register circuit must be complex, moreover, it is necessary to increase the comparators in proportion to the number of bypasses which is corresponding to the number of data. Accordingly, the size of the bypass control section inevitably becomes large.
Moreover, when the arithmetic operation unit is an arithmetic operation unit of floating point mode, there is a problem as mentioned below in the conventional technology.
Namely, according to such an arithmetic operation unit, a numeral value of the floating point mode is expressed as a marked absolute value which comprises an index part, a mantissa part and a mark of the mantissa part. For example, the double-precision numerical value D (64 bits) and the single-precision numerical value S (32 bits) defined in the standard of IEEE 754 are respectively expressed as follows:
D=(-1).sup.s .times.1, f.times.2.sup.e-1023
s: 0 or 1 (1 bit) PA1 F: 000 . . . 00 to 111 . . . 11 (52 bits) PA1 e: 0 . . . 2047 (11 bits) PA1 s: 0 or 1 (1 bit) PA1 f: 000 . . . 00 to 111 . . . 11 (23 bits) PA1 e: 0 to 255 (8 bits)
S=(-1).sup.s .times.1, f.times.2.sup.e-127
In the above expression, s designates a mark of the mantissa part, and is 0 when D (or S) is positive or is 1 when negative. Moreover, f shows a part below the decimal-point corresponding to the mantissa part, which is normalized so that the integer part corresponding to a hidden bit becomes 1, and e designates the index part, which is expressed in the offset mode where 1023 as a bias value in case of the double precision or 127 in case of the single precision is added to the original index value. FIGS. 5a and 5b show the formats respectively.
Incidentally, because of extremely complex process operation, the floating point arithmetic operation requires far much time for the execution as compared with a simple integer arithmetic operation. Moreover, it is very difficult to control the cost, required for the hardware package for realising desired execution speed and precision, within the actual trade level. Namely, in order to increase the operation speed and enhance the precision, the size of the arithmetic operation unit must be considerably large. In particular, in order to improve the precision of the mantissa part, it is necessary to enlarge the bit width so that the mantissa part operation section occupies the greater part of the arithmetic calculation unit, and the production cost is largely increased because of complexity of the calculation concerning the part. On the other hand, with respect to an index part operation section and a mark part operation section, each bit number corresponding to these parts is not so increased even if the precision thereto is increased. Moreover, since the operation of these two sections is simple in common, the area occupied thereby in the arithmetic operation unit is not so large as to be questioned.
FIG. 6 shows an example of conventional arithmetic operation units to be operated in the double-precision or single-precision mode. In the same drawing, this arithmetic operation unit comprises mark part operation means 500, index part operation means 501 and mantissa part operation means 502, each of which is respectively provided with an arithmetic operation unit comprising a bit width sufficient for directly carrying out the double-precision process on numeral data. Accordingly, in the highest speed mode, since a set of data can be inputted per clock irrespectively of double-precision numeral data or single-presion numeral data, one arithmetic operation result can be obtained to each clock.
In the arithmetic operation unit, the index part operation means 501 comprises, for example, an index part comparator 503, an index part selector 504, an adder-subtracter 505 and an incrementer 506, so as to carry out an arithmetic operation concerning the index part in accordance with input of a signal E of a bit width e shown in FIGS. 5a and 5b. While, the mantissa part operation unit 502 comprises, for example, a mantissa part exchanger 507, a digit adjusting shifter 508, an inversion circuit 509, an adder 510, a complement circuit 511, a normalizing shifter 512, a priority encoder 513, a discarding or raising circuit 514 and a renormalizing circuit 515.
Incidentally, the term of "discarding" means discarding a number of the smallest digit or figure of a numerical value into 0, or repetition of this process up to a suitable digit. While, the term of "raising" means that a number of the smallest digit of a numerical value is raised and 1 is added to a number of the secondly smallest digit in the value, or repetition of this process.
Moreover, to the mantissa part exchanger 507, a signal F of a bit width f shown in FIGS. 5a, 5b is inputted so as to carry out an arithmetic operation concerning the mantissa part.
As stated above, in the conventional floating point arithmetic operation unit of the double precision and single precision modes, since each of the index part operation means 501 and the mantissa part operation means 502 is provided an arithmetic operation unit of a bit width which sufficient for independently or directly carrying out a process operation of double-precision data, when a double-precision data comprising signals S, E, F is inputted as shown in FIG. 5a, the respective circuits calculate the floating-decimal-point numerical value based on predetermined operation modes. On the other hand, when a single-precision data comprising the signals S, E, F is inputted, parts of the respective operation means are used for the arithmetic operation concerning the floating point numerical value. Accordingly, in case of the data operation process in the single-precision mode, for example, with respect to the considerably expensive mantissa part operation means 502, only about a half part of the bit width thereof is used for the object, so that the hardware processing capacity is not utilized efficiently.
Incidentally, since the respective circuit parts in the conventional unit are similar to those used in embodiments related to the present invention on which detailed description will be given below, the explanation is omitted here.
Namely, in the conventional floating point arithmetic operation unit for correctively processing a plurality of numeral data either in the double-precision mode or in the single-precision mode, the processing capacity provided in the hardware system can not be utilized efficiently on the operation of single-precision numeral data.