1. Field of the Invention
The present invention relates to the technology for use in an arithmetic unit for performing an arithmetic operation on digital data, and more specifically to the technology for use in an arithmetic unit for performing a product-sum operation.
2. Description of the Related Art
First, the representation form of a floating point value in the standard (IEEE-754) for a binary floating point arithmetic of the Institute of Electrical and Electronics Engineers, Inc. (IEEE) is described below by referring to FIGS. 1A and 1B.
As shown in FIG. 1A, a floating point value is formed by three fields of a sign bit S, an exponent part E, and a fixed-point part F.
The sign bit S is constantly 1-bit data indicating the plus/minus sign. ‘0’ indicates a positive number, and ‘1’ indicates a negative number.
The fixed-point part F indicates the value (normalized number) equal to or larger than 1.0 and smaller than 2.0. When a negative power of 2 is given in each bit, the value is represented. For example, if the first bit of the exponent part F is ‘1’, the value of 2−1, that is, 0.5, is represented. If the second bit is ‘1’, then the value of 2−2, that is, 0.25, is represented. The value obtained by adding 1.0 to the sum of the values represented by these bits is defined as the value of the fixed-point part. The added value of 1.0 corresponds to the value of 20 when ‘1’ is assumed as the 0-th bit in the fixed-point part. Since ‘1’ is constantly set for a normalized number in this bit, this bit is not actually set in the field of the fixed-point part F. However, it is processed as if it were set in the field. This bit is also called an ‘implicit 1’.
The exponent part E indicates an integer of the power of 2. In the exponent part E, a biased representation is used to realize the representation of a negative value. The biased value is predetermined based on the precision of a represented floating point value.
Assuming that the biased value given by the exponent part E is B, the floating point value of X represented by S, E, F, and B is obtained by the following equation.X=(−1)S×2E-B×(1.0+F)
FIG. 1B is a table indicating the number of bits assigned to each field and the value of bias B shown in FIG. 1A by precision defined by the represented floating point value.
The product-sum operation A×B+C about the three floating point values A, B, and C whose exponent parts are assigned N bits, and whose fixed-point parts are assigned M bits in accordance with the above mentioned IEEE Standard are performed with correct intermediate results computed.
FIG. 2 shows an example of a configuration of the conventional product-sum operation unit capable of performing the above mentioned arithmetic operation.
In FIG. 2, an adder 1001 and a fixed-point part multiplier 1002 perform the multiplication of A and B, and other circuits perform the addition of A×B and C. In this example, the signs of A, B, and C are not processed.
The adder 1001 adds the value of the exponent part (exponent value) of A to the exponent value of B. The bit width of N bits corresponding to the bit width assigned to the exponent part in the representation of the values A and B is used as input into the adder 1001 while the bit width of (N+1) bits generating no cancellation of significant digits in the addition is used as output from the adder 1001.
The fixed-point part multiplier 1002 adds the value of the fixed-point part (fixed-point value) to the fixed-point value of B. The bit width of (M+1) corresponding to the value obtained by adding 1 bit as the above mentioned implicit 1 to the bit width assigned to the fixed-point part in the representation of the values of A and B is used as input into the fixed-point part multiplier 1002 while the bit width of (2M+2) bits generating no cancellation of significant digits in the multiplication is used as output from the adder 1001.
When the arithmetic result of A×B is added to C, and when the exponent values do not match, it is necessary to first align the digits, transfer the point in one of the fixed-point values to allow the exponent values to match each other, and then perform the addition of the fixed-point values. These processes are performed by a subtracter 1003, a fixed-point part selector 1004, and an alignment circuit 1005.
The subtracter 1003 determines which is a larger exponent value, the arithmetic result of A×B or C, and obtains the amount of the transfer of the point of one of the fixed-point values by computing the difference between the values.
Based on the select signal output from the subtracter 1003, that is, the signal indicating a larger exponent value between the arithmetic result A×B and C, the fixed-point part selector 1004 outputs the fixed-point value having the larger exponent value into one input of an absolute value adder 1006, and outputs the fixed-point value having the smaller exponent value into the alignment circuit 1005. Since the fixed-point value of the arithmetic result of A×B transmitted from the fixed-point part multiplier 1002 is input into one input terminal of the fixed-point part selector 1004, the bit width (2M+2) is prepared for one input terminal of the fixed-point part selector 1004, and the bit width of the corresponding (M+1) bits obtained by adding 1 bit for implicit 1 to the bit width assigned to the fixed-point part in the representation of the value of C is prepared for the other input terminal. Since the fixed-point part of the arithmetic result of A×B can be output as is to the two output terminals of the fixed-point part selector 1004, the bit width of (2M+2) bits is prepared for the two output terminals of the fixed-point part selector 1004.
The alignment circuit 1005 transfers the point of the fixed-point value given from the fixed-point part selector 1004 according to the shift amount information output from the subtracter 1003, that is, the information about the amount of the transfer when the point of the fixed-point value of the smaller exponent value between the arithmetic result of A×B and C is transferred for alignment, and the fixed-point value after the transfer is output to the other input terminal of the absolute value adder 1006. The bit width of (2M+2) bits is prepared for the input and output of the alignment circuit 1005.
The absolute value adder 1006 performs an adding operation with the bit width of (2M+2) bits of the fixed-point values of the arithmetic result of A×B and C which are aligned.
The addition result of the fixed-point parts of A×B and C obtained by the absolute value adder 1006 can be out of bounds of the above mentioned normalized number. A normalization circuit 1007 normalizes the addition result, and the amount of transfer of the point of the fixed-point value performed for the normalization is transmitted as shift amount information to an exponent part amendment circuit 1010. The bit width of (2M+2) bits is also prepared for the input and output of the normalization circuit 1007.
A rounding circuit 1008 rounds the number of digits of the fixed-point part of (2M+2) bits output from the normalization circuit 1007 into the number of digits having valid precision, that is, in this example, a transfer from (M+1) bits indicated by the fixed-point parts of the original A, B, and C to M bits obtained by subtracting 1 bit as an implicit 1 from (M+1) described in the fixed-point part of the original A, B, and C, and outputs the result as the fixed-point part of the product-sum operation of A×B+C.
Described below is the explanation of round-up. A round-up method can be as follows.
(1) Round-down: In the arithmetic results, the bits less significant than the number of bits assigned to the fixed-point part in a predetermined numeric representation form is rounded down.
(2) Round-up: A value represented by the number of bits assigned to the fixed-point part on a predetermined numeric representation form, and a value whose absolute value is larger than and closest to the arithmetic result.
(3) Positive direction round-up: A value represented by the number of bits assigned to the fixed-point part on a predetermined numeric representation form, and a value larger than and closest to the arithmetic result.
(4) Negative direction round-up: A value represented by the number of bits assigned to the fixed-point part on a predetermined numeric representation form, and a value smaller than and closest to the arithmetic result.
(5) Average value 1: In a predetermined numeric representation form, a value which can be represented as the number of bits assigned to the fixed-point part, and is closest to the arithmetic result. If the arithmetic result cannot induce such a value, that is, if the first significant bit excluding the fixed-point part is ‘1’, and the bits less significant are all ‘0’, then the value indicating 0 (or 1) as the least significant bit in the fixed-point part is selected from the two closest values. The first significant bit excluding the fixed-point part refers to the bit less significant by one bit than the least significant bit in the fixed-point part assigned in a predetermined numeric representation form, which is shown in FIG. 3.
(6) Average value 2: In a predetermined numeric representation form, a value which can be represented as the number of bits assigned to the fixed-point part, and is closest to the arithmetic result. If the arithmetic result cannot induce such a value, that is, if the first significant bit excluding the fixed-point part is ‘1’, and the bits less significant are all ‘0’, then the value whose absolute value is larger (or smaller) is selected from the two closest values.
(7) Average value 3: In a predetermined numeric representation form, a value which can be represented as the number of bits assigned to the fixed-point part, and is closest to the arithmetic result. If the arithmetic result cannot induce such a value, that is, if the first significant bit excluding the fixed-point part is ‘1’, and the bits less significant are all ‘0’, then a larger (or smaller) value is selected from the two closest values.
As described above, the rounding operation can be performed in a number of methods, and can be selected based on the use of an arithmetic result.
Back in FIG. 2, a selector 1009 selects a larger exponent value between the arithmetic result of A×B and C, that is, the reference exponent value in the addition of fixed-point values performed by the absolute value adder 1006, according to the select signal output from the subtracter 1003.
The exponent part amendment circuit 1010 amends the exponent value selected by the selector 1009 according to the shift amount information transmitted from the normalization circuit 1007, transfers it into a value of N bits assigned for the exponent part in the numeric representation form, and outputs the result as an exponent value of the product-sum operation result of A×B+C.
The product-sum operation unit shown in FIG. 2 performs the product-sum operation of A×B+C as described above.
As described above, to perform the above mentioned product-sum operation on A×B+C without limiting the probable exponent value or fixed-point value for each of A, B, and C, the precision of at least (2M+2) bits is required for a fixed-point part, and the precision of at least (N+1) bits is required for an exponent part for use in a multiplication result. Furthermore, the multiplication result A×B is to be used as an operand of the next addition. Therefore, if a general purpose arithmetic unit is able to perform the product-sum operation, it is necessary for the unit to be provided with the (N+1) bit exponent part subtracter 1003, the exponent part amendment circuit 1010 from (N+1) bits to N bits, the (2M+2) bit fixed-point part selector 1004, the (2M+2) bit alignment circuit 1005, the (2M+2) bit absolute value adder 1006, the (2M+2) bit normalization circuit 1007, and the rounding circuit 1008 only for the above mentioned purpose as shown in FIG. 2, thereby giving a heavy load on the circuit implementation.
Other technologies of performing a product-sum operation using an existing arithmetic unit has been disclosed (for example, Japanese Patent Publication No.10-207693). However, in these technologies, the case in which an arithmetic result requires normalization, the case, in which a carry-out from a fixed-point part occurs in the addition of the multiplication result of A×B to C, etc. are considered special cases, and special processes are performed to process the special cases. Since performing the special processes generates a long latency in an arithmetic operation, inappropriate arithmetic operations are included in these technologies. For example, when a remainder is continuously obtained when a dividend is divided by a divisor, an integer part Z of the quotient obtained by dividing a dividend X by a divisor Y is to be first obtained, and then the computation of X−Z×Y is performed to obtain the remainder. In the arithmetic operation, since a normalizing process is performed with a high probability especially after performing a dividing operation, most of these processes are exception processes, thereby prolonging the latency in arithmetic operations.