1. Field of the Invention
The present invention relates to a multiply-add operating device for a floating point number. More particularly, the present invention relates a technique for realizing an improvement in operation efficiency by shortening a critical path for an accumulative adding process of continuous multiply-add operations in a multiply-add operating device which executes a multiply-add operation of floating point numbers by a multiplication process and an addition process for accumulating multiplication results.
2. Description of the Background Art
In recent years, with the rapid spread of multimedia and the spread of video games using advanced GUI (Graphic User Interface) graphics, computer graphic (CG) techniques have been more important. In particular, by the rapid spread of personal computers and video game machines to homes, a demand on a three-dimensional computer graphics (3D-CG) application serving as an application operating on a high-performance processor, especially, a high-quality moving-picture application has been increasing. These CG processes requires an enormous amount of calculation and high calculation performance. Here, a geometrical process in the CG is a phase in which a transformation process or an illumination process of a geometrical graphic model such as coordinate transformation or viewpoint transformation. In these geometrical processes, an inner-product operation is frequently used because a matrix operation and a vector operation are performed. The inner-product operation is frequently used in not only the 3D-CG process described above, but also numerical calculations in conventional scientific and technical calculations. Realization of a multiply-add operating device which perform inner-product operations at a high speed is desired.
The configuration of a conventional floating point multiply-add operating device for performing an inner-product operation will be concretely described below. The configurations of the floating point multiply-add operating device are roughly classified into the following two types.
FIG. 1 shows the first configuration of a floating point multiply-add operating device. In the first configuration, a multiplier and an adder are mounted, and the multiplier and the adder are longitudinally connected to each other, or between which an operation result are bypassed as an operand so as to realize a multiply-add operation.
FIG. 2 shows the second configuration of a floating point multiply-add operating device. In the second configuration, an operating device is not divided into a multiplier and an adder, and a dedicated multiply-add operating device is directly constituted. In a graphic-dedicated machine in which a multiply-add operation occupies a large part of a whole process, the configuration of the second type is often employed. However, in a general MPU (microprocessing unit), the cost of the configuration in which the dedicated operating device is arranged is large. For this reason, the first configuration which is simple and has a good affinity with floating point operating devices in many MPUs is frequently employed.
The details of the processes of the first configuration will be described below. In the following description, it is assumed that all floating point operating devices conform to IEEE754 floating point standards which is operation standards of floating point numbers.
As shown in FIG. 1, a floating point multiply-add operating device of the first type is constituted by a multiplication unit 100 and a addition unit 200. The mantissa operating section of the multiplication unit 100 comprises a multiplication tree 101, a booth decoder 102, a final adder 103, a normalizing circuit 104, and registers 105 to 109. The exponent operating section of the multiplication unit 100 comprises two adders 110 and 111 and registers 112 and 115.
First, the configuration and operation of the multiplication unit 100 will be described below. In the multiplication unit 100, multiplication of the mantissa of an operand is executed first. Mantissas Fa and Fb of two floating point number operands are multiplied by the booth decoder 102 and the multiplication tree 101, and the final adder 103 calculates the final product of the mantissas. On the other hand, with respect to exponents, exponents Ea and Eb of two floating point number operands are added to each other by the adder 110.
Here, the exponents of the operands are subjected to biased representation. For this reason, to be exact, a biased value is subtracted from the sum of exponents calculated as described above to calculate the sum of exponents in the biased representation. In the following description, it is assumed that exponents are subjected to the biased representation.
If carry occurs in the mantissa as a result of the multiplication, normalization is performed. More specifically, a mantissa outputted from the final adder 103 is shifted by only one bit by means of the normalizing circuit 104. At the same time, an exponent outputted from the adder 110 is incremented by the adder 111.
FIG. 3 shows a procedure performed when the processes of the multiplication unit 100 are subjected to a pipeline process on two stages (stage X1 and stage X2). Multiplication is executed on the stage X1, and final addition and normalization are executed on the stage X2. More specifically, on the stage X1, the sum of the exponents of operands is calculated to be subjected to biased representation, and a product of mantissas is calculated. In FIG. 3, Ei represents a biased value. On the stage X2, the exponent of an operand obtained in (1) is incremented, and the final product of mantissas outputted from the final adder 103 is shifted to the right by one bit so as to normalized. The two stages shown in FIG. 3 indicate an example of a recent typical pipeline process. In FIG. 3, shift (X, n, r) represents that X is shifted to the right by n bits.
The configuration and operation of the adder 200 will be described below. The mantissa operating section of the addition unit 200 comprises a shifter 201 for performing alignment of operands Fm and Fn of mantissas, an adder 202 for calculating a sum of mantissas, a normalizing circuit 203, a preceding 0 detection circuit 204, and registers 205 to 209. The exponent operating section of the addition unit 200 comprises two adders 210 and 211, a selector 212, and registers 213 to 215. In the configuration of the addition unit 200, an addition process is performed according to the steps of: (1) calculation of an alignment shift count; (2) swap; (3) alignment; (4) addition or subtraction; (5) calculation of a normalization number; and (6) normalization.
(1) In the step of calculating an alignment shift count, an alignment shift count representing the number of bits to be shifted to perform alignment of the mantissas Fm and Fn of two operands is calculated. This alignment shift count is calculated as an absolute value of the difference between exponents Em and Ec of the two operands. The calculation of the alignment shift count is executed by the adder 210.
(2) In the swap step, a large one of the exponents Em and Ec of the two operands is selected as an intermediate value Ed of an addition operation by the selector 212 according to carry of an addition result of the adder 210. Next ,the mantissas Fm and Fn of the two operands are swapped as needed. Here, of the mantissas Fm and Fn of the two operands, the mantissa of the operand having a small one of the exponents Em and Ec must be input to the shifter 201 to perform alignment.
(3) In the alignment step, on the basis of a result of the calculation of an alignment shift count step (1), the mantissa of an operand having a small exponent is shifted to the right by a necessary number at the shifter 201, thereby performing alignment.
(4) In the addition or the subtraction, the mantissas of the two operands are added to each other by the adder 202.
(5) In the step of calculating a normalization number, the number of digits which are canceled as a result of the addition or the subtraction in step (4) is detected as a normalization number by counting the number of preceding zeros in the preceding 0 detection circuit 204.
(6) In the normalizing step, normalization is performed by the normalization number calculated in step (5). More specifically, in the exponent operating section, a normalization number N is subtracted from an intermediate value Ed of the exponent of the operand by the adder 211. In the mantissa operating section, the mantissa of an operand outputted from the adder 202 is shifted to the left by the normalization number by the normalizing circuit 203.
FIG. 4 shows a procedure performed when the processes of the addition unit 200 described above are subjected to a pipeline process on two stages (stage A1 and stage A2). The processes (1) to (3) are executed on the stage A1, and the processes (4) to (6) are executed on the stage A2. More specifically, on the stage A1, the absolute value of the difference between the exponents Em and Ec of the two operands is calculated as an alignment shift count S, a large one of the exponents Em and Ec of the two operands is selected as the intermediate value Ed of the exponents. Note that in FIG. 4, max(X, Y) represents that a large one of X and Y is selected. On the stage A1, the mantissas of the two operands are swapped as needed, and the mantissas are shifted by a number corresponding to an alignment shift count S. Note that shift (X, n, 1) represents that X is shifted to the left by n bits. On the stage A2, mantissas Fss and Fst of the two shifted operands are added or subtracted, and preceding zeros of an addition result Fs are counted to be set as the normalization number N. Note that PriorityEncode(X) represents that zeros on an MSB side of X are counted. On the stage A2, N is subtracted from the intermediate value Ed of the exponents, and the addition result Fs of the mantissas is shifted to the left. The two stages shown in FIG. 4 indicate an example of a recent typical pipeline process.
In order to perform a multiply-add operation by the first configuration described above in FIG. 1, the two multiplication units 100 and 200 have to be longitudinally connected to each other. When a multiply-add operation Axc3x97B+C (C is set as a result of the preceding multiply-add operation) is executed, two operands A and B are input to the multiplication unit 100 to be multiplied. The multiplication results are inputted as one of operands to the addition unit 200 by using paths indicated by *1 and *2 shown in FIG. 1, respectively. At this time, the operation results C of the preceding multiply-add operation is inputted as the other of the operands to the addition unit 200 by using paths indicated by *3 and *4 shown in FIG. 1. Accumulative addition of the exponent of the multiplication result of two operands A and B and the exponent of the operation result C of the preceding multiply-add operation and accumulative addition of the mantissa of the multiplication result of the two operands A and B and the mantissa of the operation result C of the preceding multiply-add operation are performed, respectively.
FIG. 5 shows an example of the flow of the pipeline processes in FIG. 3 and FIG. 4 in time series. As shown in FIG. 5, in the multiply-add operating device having the first configuration shown in FIG. 1, like the operation Axc3x97B+C, multiply-add operations having such dependence that a preceding operation result is used in a subsequent operation cannot be continuously executed. More specifically, a gap is formed in the pipeline processes between the preceding multiply-add operation (nxe2x88x921) and the subsequent multiply-add operation n, so that the operation cannot be executed at a throughput xe2x80x9c1xe2x80x9d. This is because an operation of floating point numbers generally requires a plurality of cycles, for example two cycles in the conventional art described above.
However, an inner-product operation frequently used in a graphic process, a numerical operation, or the like is generally realized by continuous multiply-add operations having dependence. In this case, when NOP (No Operation) serving as a state wherein an operation is not performed every multiply-add operation instruction occurs as shown in FIG. 5, operation efficiency is considerably degraded. This NOP can be reduced to some extent by scheduling instructions. However, the scheduling of instructions can be applied to only a case wherein there are instructions which has no dependency and which can fortunately bury the portion of the NOP. The scheduling is not effective to all continuous multiply-add operations.
When the throughput of a multiply-add operation is set to be xe2x80x9c1xe2x80x9d to solve the problem of such an idle state, the processes (1) to (6) must be executed by one cycle. In this case, critical paths are all the processes (1) to (6), and, therefore, a time required for one cycle is equal to a time taken to perform all the processes (1) to (6). This is a very long time, and is impractical.
Next, the details of the processes of the second configuration will be described below.
The mantissa operating section of the multiply-add operating device 300 comprises a multiplication tree 301 and a booth decoder 302 for calculating a product of mantissas Fa and Fb of first and second operands, a bidirectional shifter 303 for performing alignment of a third operand, an adder 304 for calculating a sum (sum of products) of a multiplication result and an alignment result obtained by the multiplication tree 301, a normalizing circuit 306 for normalizing a multiply-add operation result (i.e., product-sum) output from the adder 305, a preceding 0 detection circuit 307 for calculating a normalization number, and registers 308 to 313. The exponent operating section of the multiply-add operating device 300 comprises an adder 314 for calculating the value (i.e., sum of exponents) of a product of exponents Ea and Eb of the first and second operands, an adder 315 for calculating an alignment shift count S (i.e., difference between the exponent of the third operand and the exponent of the product of the first and second operands), a selector 316 for selecting one of the exponents of the third operand and the product of the first and second operands on the basis of carry of an operation result of the adder 315, an adder 317 for performing normalization, and registers 319 to 322. The selector 316 selects a large one of the exponent of the third operand and the exponent of the product of the first and second operands.
The multiply-add operating device 300 having the second configuration calculates sums of products at once. Multiply-add operations are executed by the following manner in the second embodiment.
The mantissas Fa and Fb of the first and second operands are inputted to the multiplication tree 301 and the booth decoder 302 for calculating the product of the first and second operands to be multiplied. With the multiplication process, the third operand is aligned by the bidirectional shifter 303. The direction of the shift and the number of bits by which the third operand is shifted are calculated as a difference between the sum of the exponents of the first and second operands and the exponent of the third operand. These alignment shift counts are calculated by the adders 314 and 315. Next, a large one of a sum of the exponents of the first and second operands and the exponent of the third operand is set as the intermediate value Ed of the exponents. A sum (sum of products) of alignment results between the product of the first and second operands and the third operand is calculated by the adder 304 and 305. The multiply-add operation result (product-sum) outputted from the adder 305 is normalized by the normalizing circuit 306. An alignment shift count representing the number of bits of shifting in the normalization is calculated by the preceding 0 detection circuit 307. In the exponent operating section, normalization is performed such that the normalization number N calculated by the preceding 0 detection circuit 307 is added to or subtracted from the intermediate value Ed of the exponents by the adder 317. The series of processes described above are exposed by, for example, the pipeline processes on two stages. The multiply-add operating device in FIG. 2 has the most typical configuration in the multiply-add operating device having the second configuration. Therefore, as shown in FIG. 2, as in the second configuration, the operation result of certain operands cannot used as an operand in the next operation before two clock interval. More specifically, multiply-add operations having dependence cannot be executed only every two clocks, the same drawback as in the multiply-add operating device having the first configuration in FIG. 1 is caused.
In this manner, a conventional floating point multiply-add operating device, unlike an integer multiply-add operating device, has a long latency and does not execute, especially, multiply-add operations having dependence every clock, so that an operating time is disadvantageously long.
The present invention has been made to solve the above problem. It is an object of the present invention to provide a multiply-add operating device for a floating point number in which, in an accumulative addition process or an accumulative subtraction process in repetitive execution of continuous multiply-add operations, calculation of an alignment shift count required in an alignment process of an accumulation process of a subsequent multiply-add operation is started before a normalization process is finished, and the alignment shift count is calculated simultaneously with the normalization process, so that a critical path of the multiply-add operations is shortened to improve operation efficiency.
According to a certain characteristic feature of the present invention, as shown in FIG. 6, a device for performing a multiply-add operation in which multiplication of floating point numbers consisting of mantissas and exponents and accumulative addition or accumulative subtraction of results of the multiplication are performed, comprising:
an exponent operating section for comparing an exponent of a floating point number of an operation result of a preceding multiply-add operation n with an exponent of a multiplication result of a subsequent multiply-add operation (n+1), and calculating an alignment shift count of the multiply-add operation (n+1) by the comparison result, the alignment shift count is obtained by starting calculation of the alignment shift count of the subsequent multiply-add operation (n+1) before normalization of the preceding multiply-add operation n is finished; and
a mantissa operating section for aligning one mantissa of mantissas of two operands according to the alignment shift count inputted from the exponent operating section, calculating a sum of an aligned mantissa of the operand and the mantissa of the other operand, and normalizing a calculated addition result of the calculated mantissas as needed, thereby calculating a mantissa of the multiply-add operation (n+1).
This exponent operating section performs calculation in two phases. More specifically, the exponent operating section includes:
a first alignment controller for calculating a difference between an intermediate value between an exponent of an operation result of the preceding multiply-add operation n and an exponent Em(n+1) of a multiplication result of the subsequent multiply-add operation (n+1); and
a second alignment controller for subtracting a normalization number N of the multiply-add operation (n+1) from a calculation result in the first alignment controller.
The configuration of the floating point multiply-add operating device, the device for performing a multiply-add operation in which multiplication of floating point numbers consisting of mantissas and exponents and accumulative addition or accumulative subtraction of results of the multiplication are performed, comprising:
a multiplication unit for performing multiplication of two operands consisting of mantissas and exponents; and
an addition unit, constituted by an exponent operating section and a mantissa operating section, for performing an accumulating process of multiplication results of the multiplication unit;
wherein:
the exponent operating section includes:
a first alignment controller for calculating a difference between an intermediate value between an exponent of an operation result of the preceding multiply-add operation n and an exponent Em(n+1) of a multiplication result of the subsequent multiply-add operation (n+1); and
a second alignment controller for subtracting a normalization number N of the multiply-add operation (n+1) from a calculation result in the first alignment controller so as to calculate an alignment shift count.
The exponent operating section may further include:
a selector for selecting a large one of the intermediate value and the Em(n+1) as an intermediate value Ed of the accumulating process of the multiply-add operations on the basis of control of the second alignment controller; and
an exponent normalizing section for subtracting the normalization number N from the intermediate value Ed selected by the selector so as to calculate the exponents of the operands of the multiply-add operations.
The mantissa operating section may include:
a shifter for performing alignment of one of a mantissa of the operand of the operation result of the preceding multiply-add operation n and a mantissa of an output from the multiplication unit of the subsequent multiply-add operation (n+1) according to the alignment shift count output from the second alignment controller;
an accumulating processing operating section for performing an accumulating process of a mantissa of an operand aligned by the shifter and a mantissa of the other operand;
a normalization number calculating section for detecting a number of canceled digits of a process result outputted from the accumulating processing section so as to calculate the normalization number N; and
a normalizing unit for normalizing the process result outputted from the accumulating processing section according to the normalization number N outputted from the normalizing number calculating section.
Other features and advantages of the present invention will become apparent from the following description taken in conjunction with the accompanying drawings.