In computers or arithmetic hardware, integer and floating-point formats have been widely used to simply describe values that are extremely-large or extremely-small. In the integer format, it is easy to make the hardware to carry out calculations, however, there is a drawback in that the integer format can represent a small range. Thus the algorithm must be scaled such that overflow does not happen. In the floating-point format, a wide range of values can be represented and the algorithm does not need to be scaled to prevent overflow.
IEEE-754 format is the most commonly used for floating-point representation. FIG. 1 shows an example of a conventional floating-point format. The floating-point format is constituted by a 1-bit Sign, a 7-bit Exponent and an 8-bit Mantissa. The addition of values in the floating-point format is complicated, and the circuit size and the logic delay are large. However, so far, since there is no numerical format by which the floating-point format can be replaced, a lot of researches have been made for the calculation in this format.
FIG. 2 illustrates a floating-point adder of a conventional style. First, absolute values of two inputs X and Y are compared in swapping unit 201. The larger one is selected as A and the smaller one is selected as B. Here E(x) represents the exponent and M(x) represents the mantissa for floating-point number x. If X>Y, E(X) and M(X) are respectively output as E(A) and M(A).
In a barrel shifter A 203, the mantissa for B (M(B)) is shifted to the right direction based on the difference between E(A) and E(B) calculated by subtracter 202. This is referred to as an alignment. Then in a fixed point main adder 204, the mantissa for A (M(A)) is added or subtracted with the shifted M(B) according to the sign bit of X and Y.
The calculation result from the fixed point main adder 204 is provided to a leading zero counter 205, which counts the number of consecutive zeros from the most significant bit (MSB) and outputs the leading zero count to a barrel shifter B 206. The leading zero count is also used to adjust the exponent.
When the subtraction happens and the result becomes smaller, the calculation result of the fixed point adder 204 is shifted left depending on the leading zero count by a barrel shifter B 206, which is referred to as a Normalization. The barrel shifter B 206 outputs the shift result to a rounding unit 208. The subtracter 207 subtracts the leading zero count from the exponent E(A) and outputs the subtraction result to the rounding unit 208. The rounding unit 208 executes rounding and outputs E(Z) and M(Z) as the final calculation result. In FIG. 2, the dotted line shows a critical path in the floating-point adder.
The value of a floating-point number X in the conventional floating-point format is expressed using the integer values of exponent E(X), mantissa M(X) and sign S(X) as a following formula (1).X=(−1)S(X)(M(X)+h)2E(X)+q  (1)
Herein q is an integer constant for the offset, and h is an integer constant representing the hidden mantissa.
Even with the type of floating-point adder shown above, its implementation is complicated and the size of the adder is several times larger than the integer adder because the barrel shifter and the subtracter need relatively large size logic circuits, which make the critical path longer. In addition, a logic delay of the adder is also large, which results in the operating clock frequency being limited and is sometimes required to prepare additional pipeline stages that also require the extra hardware size.