Conventionally, when floating point numbers are added or subtracted, or multiplied and accumulated, the mantissas of operands, which are objects to be computed, are aligned, and the computation result is normalized and rounded. More specifically, for example, two operands OP1 and OP2 are aligned by shifting one of them to the right, and after the computation, left shifting is performed to remove the leading zeros continuously aligned before the beginning of the computation result. The computation result normalized by being shifted to the left is rounded to the number of digits that meets, for example, Institute of Electrical and Electronics Engineers (IEEE) 754 standards.
At this time, the left shift normalization is performed to remove the leading zeros. However, because the shift amount is determined after the computation result is obtained, the processing delay becomes large. Accordingly, in recent years, as disclosed in Japanese Laid-open Patent Publication No. 10-40078, for example, a shift amount is predicted in parallel with the computation performed by an adder and the like. Consequently, the computation result can be normalized as soon as it is calculated.
FIG. 13 is a block diagram of a structure of a processing unit for the mantissa, which is a part of a computation processor used to predict the left shift amount such as the above. The computation processor depicted in the diagram includes a right shifter 10, an absolute value adder 20, a leading zero predictor (hereinafter, referred to as “LZ predictor”) 30, a left shifter 40, and a rounding unit 50.
When the right shifter 10 receives an operand OP1, the right shifter 10 shifts the operand OP1 to the right so that the operand OP1 is aligned to the operand OP2. The right shifter 10 obtains a shift amount from a processing unit that processes an exponent, which is not depicted.
The absolute value adder 20 adds absolute values of the operand OP1 shifted to the right by the right shifter 10 and the operand OP2. The LZ predictor 30 predicts the number of leading zeros continuously aligned before the beginning of the computation result, obtained by having the absolute values added by the absolute value adder 20. The LZ predictor 30 then outputs the number of digits obtained by prediction to the left shifter 40 as a left shift amount.
The left shifter 40 shifts the addition result of the absolute values to the left, by the left shift amount output from the LZ predictor 30, and performs normalization by removing the leading zeros and making the first bit of the computation result to 1. In other words, for example, as depicted in FIG. 14, if zeros are continuously aligned before the beginning of the computation result of 125 bits, the left shifter 40 shifts a bit string of the computation result to the left, so that the first digit of the computation result is 1.
The rounding unit 50 outputs the processing result, by making the normalized computation result to the number of digits that meets, for example, IEEE 754 standards. At this time, for example, in a single-precision floating point format in which the mantissa is defined to be 23 bits, the rounding unit 50 determines whether 1 is added (incremented) to a portion of 23-bit mantissa, based on the 24th bit and subsequent bits of the computation result, and adds 1 as necessary. Consequently, the same process as that of the rounding of decimal numbers is also performed on binary floating point numbers.
When an increment is performed during rounding, if each of the bits in the mantissa is 1, a carry-out is generated, thereby adding 1 to the exponent. In other words, in the rounding unit 50 depicted in FIG. 13, when 1 is added to the mantissa of the processing result, a carry-out is generated, if each of the bits in the mantissa is 1. Accordingly, adding 1 to the computation result of the exponent is needed, which is not depicted.
Consequently, the computation result of the exponent cannot be obtained until the rounding unit 50 completes the rounding, thereby limiting the speed of computation to a certain level. In particular, in the IEEE 754, the mantissa is 23 bits in the single-precision floating point format, and the mantissa is 52 bits in a double-precision floating point format. Accordingly, in the rounding unit 50, 1 is added to the bit width of the mantissas, and the computation result of the exponent cannot be obtained until the result is calculated. More specifically, a carry-out is generated by incrementing in the rounding unit 50, when each of the 23 bits or the 52 bits is 1. Consequently, the delay caused by the AND operation on the 23 bits or the 52 bits occurs after the normalization.
When parity bits for error correction are calculated by a predetermined bit width for floating point format data, for example, if the acquisition of the computation result of the exponent is eventually delayed, not only the parity bits of the exponent, but the calculation on the parity bits of a portion that stretches over the exponent and the mantissa is delayed. Accordingly, the total processing delay is further increased. Such a problem occurs not only in a computation processor for adding or subtracting as depicted in FIG. 13, but also occurs similarly in a computation processor in which normalization and rounding are performed, such as a computation processor for multiplying and accumulating.