1. Field of the Invention
The present invention relates to computer systems. More particularly, the present invention relates to computer processors.
2. Description of Related Art
In the computation of the multiply-add operation A*B+C, where A, B, and C are floating point numbers, rounding is accomplished utilizing one of two techniques. The first technique is termed fused multiply-add rounding, and the second technique is termed unfused multiply-add rounding.
FIG. 1 illustrates a conventional floating point multiply-add (FMA) module 100 utilizing conventional fused multiply-add rounding. In FIG. 1, a mantissa of an operand A is input to a carry save adder (CSA) 104 at an input 104_1, and a mantissa of an operand B is input to CSA 104 at an input 104_2. The partial products of the operation A*B are formed and reduced in CSA 104 until two partial products, term S and term T, remain. In the present example, term S is output from CSA 104 at output 104_3, and term T is output from CSA 104 at output 104_4.
In parallel with the operation of CSA 104, a mantissa of an operand C is input to an alignment module 102 at an input 102_1, and the binary point of the mantissa of operand C is aligned with a position of a binary point of the product of A*B. The resultant aligned C term is output from alignment module 102 at output 102_2.
Term S, term T, and the portion of the aligned C term that is not larger than the product of A*B, are input to a full adders module (FA) 106, respectively at inputs 106_2, 106_3 and 106_1, and combined in full adders of FA module 106 to produce two resulting new terms, term X and term Y. Term X is output from FA module 106 at output 106_4, and term Y is output from FA 106 at output 106_5.
Term X and term Y are next input to a carry lookahead adder (CLA) 108, respectively at inputs 108_1, and 108_2. Term X and term Y are added in CLA 108 to produce two resultant sums, a first sum for a carry-in of zero, herein termed Sum C0, and a second sum for a carry-in of 1, herein termed Sum C1. Sum C0 is output from CLA 108 at output 108_4 and Sum C1 is output from CLA 108 at output 108_3.
The portion of the aligned C mantissa that is larger than the product of A*B, output from alignment module 102 at output 102_2, is input to an increment module 110 at input 110_1 and incremented in increment module 110. The incremented term output from increment module 110 at output 110_2 is input to mux 114 at input 114_1 together with the unincremented aligned C term input to mux 114 at input 114_2.
The Sum C0 term output from CLA 108 is input to mux 112 at input 112_2 together with the Sum C1 term input at input 112_1. Initially, the value of zero is used as input at input 112_3. The resultant carry out of mux 112 at output 112_4 is then input to mux 114 at input 114_3 and is used to select the incremented or unincremented high order bits, i.e., the bits that are in positions larger than the positions for the product of A and B, in mux 114. The initially selected high order bits are then output from mux 114 at output 114_4.
The resultant carry out from mux 114 is termed the end around carry. The end around carry is then used as the carry in to CLA 108, which is accomplished by replacing the initial input of zero at input 114_3 to mux 114 with the end around carry value. After this replacement, the output from mux 112 at output 112_4 becomes the input to normalizer module 116 at input 116_2. The carry out from mux 112 at output 112_4 is input to mux 114 at input 114_3 and used to select the incremented or unincremented high order bits.
The selected high order bits output from mux 114 are then input to normalizer 116 at input 116_1 together with the resultant carry out of mux 112 input to normalizer 116 at input 116_2.
Normalizer 116 normalizes the values and outputs the normalized value at output 116_3. The normalized value is input to a rounding module 118 at input 118_1 where the normalized value is rounded and the fused multiply-add rounding result output from rounding module 118 at output 118_2. The above fused multiply-add rounding method is well known to those of skill in the art and is not further described herein in detail to avoid detracting from the principles of the invention.