1. Field of the Invention
The present invention relates to an apparatus for floating-point arithmetic processing, and in particular to a floating-point multiplication apparatus or addition and subtraction apparatus in which the state of a specific bit, referred to as a sticky bit, depends upon the logical sum of a set of low-significance bits which are truncated during a mantissa processing operation, and in which that specific bit is used in determining a subsequent mantissa round-off operation that is executed on the result of that mantissa processing operation.
2. Description of the Related Art
In a floating-point multiplication operation, it is necessary to multiply the mantissa of one operand by the mantissa of the other (each mantissa consisting of a standard fixed number of bits). The result of the mantissa multiplication (referred to in the following as the mantissa intermediate product) will contain approximately twice as many bits as that of each mantissa of the original operands. Thus it is necessary to truncate an excess number of low-significance bits. However to achieve maximum accuracy in a round-off operation that is executed to obtain the mantissa of the final product, three bits which are obtained as a result of that mantissa multiplication and are of lower significance than the standard mantissa LSB position are used in that round-off operation. These additional bits are respectively designated as, in descending order of significance following the standard mantissa LSB position, a guard bit, a round bit and a sticky bit, and appear in the mantissa intermediate product after the aforementioned bits have been truncated. The "1" or "0" state of the sticky bit is determined by an OR logic operation applied to all of the aforementioned truncated low-significance bits, that is to say, if any of the truncated bits is a "1" bit then the sticky bit is set to "1" and otherwise to "0". A predetermined round-off operation is then applied to the resultant mantissa intermediate product, with that round-off operation being executed based on a combination of bit states including those of the guard bit, round bit and sticky bit. With one prior art floating-point multiplication apparatus, as described in detail hereinafter, the aforementioned truncated bits of the mantissa intermediate product are successively examined by using a shifting circuit, in order to apply the logical OR operation to these, i.e. to detect the presence of a "1" state bit. However such a successive shifting operation is an obstacle to achieving a high speed of processing.
In the case of a floating-point addition and subtraction apparatus, before adding together two floating-point operands, or subtracting one from the other, it is necessary to execute shifting of the mantissa bits of one of the operands, if the operands have respectively different exponents. In practice, the mantissa bits of the operand having the smaller value of exponent are right-shifted to bring about equality of the exponents (i.e. shifted by a number of bit positions that is equal to the difference between the two exponents), and thus enable addition or subtraction of the result of the right-shifting to or from the other mantissa. As a result of that right-shift operation, some of the low-significance bits of the mantissa of the operand having the lower value of exponent will overflow beyond the mantissa LSB position. In order to achieve maximum accuracy, it is necessary to take into account these overflowed bits in the subsequent arithmetic processing. This is done by attaching to the result of that right-shifting operation, at a lowest significance position (specifically, two bit positions below the standard mantissa LSB position), a bit referred to in the following as a spilled bit, whose state is determined by the logic sum (i.e. OR function) of all of the bits which have overflowed. That is to say, if at least one of these overflow bits is a "1" value, then the spilled bit is set as "1", and otherwise it is set as "0". After the addition or subtraction of the mantissas has been executed and the absolute value of the result obtained, a round-off operation is executed on the result, using the states of bits at positions corresponding to the aforementioned guard bit, round bit and sticky bit to determine the type of round-off operation. The state of the sticky bit is determined by that of the spilled bit and by whether a mantissa addition or subtraction operation was executed.
One prior art method that has been proposed for determining the state of the spilled bit in this case is to successively shift each of the overflow bits through a circuit which detects the presence of any "1" state bit, and to set the contents of a 1-bit register to the "1" state when such detection occurs. However such a method has the disadvantage of time being required for shifting these successive bits in order to determine the state of the spilled bit. If such a successive shifting operation were not necessary, it would be possible to execute a parallel shifting operation on the mantissa of the operand having the lower value of exponent. In addition, the amount of time required to determine the spilled bit is not fixed, but will vary in accordance with the amount of difference between the exponents of the two operands and the position of the lowest-significance "1" state bit in the operand which has the lower value of exponent. Such variations in timing result in complexity of circuit control, making it difficult to implement a practical floating-point addition and subtraction apparatus by such a method. For that reason, a floating-point addition and subtraction apparatus has been proposed, as described in greater detail hereinafter, in which the state of the spilled bit is established based upon the difference between the exponents of the two operands and the position of the lowest-significance "1" state bit in the operand which has the smaller exponent of the two operands. However it is necessary to first determine which of the operands has the lower value of exponent, before detecting that position of the lowest-significance "1" state bit and then determining the state of the spilled bit, so that the greatest possible processing speed cannot be attained.