A multiply/accumulator (MAC) is a circuit that accepts three numerical input values A,B and C, and produces therefrom the arithmetic result A*Bxc2x1C. Often, A, B and C are signed floating point 5 binary numbers expressed in the format described by ANSI/IEEE 754-1985, or perhaps some compatible extension thereof, in which case the circuit is a Float Point MAC, or FMAC. A modern binary FMAC is, as things go, a rather large circuit, and is usually one of the main components in an FPU (Floating Point Unit), which is itself a major functional portion of many microprocessors. An FMAC will produce A*B+C in response to the execution by its host environment (generally an FPU) 10 of an FMA instruction (Floating Point Multiply Add). There is usually also an FMS instruction (Floating Point Multiply Subtract), that produces A*Bxe2x88x92C. We will be interested in the internal operation of an FMAC for both FMA and FMS instructions.
Another aspect of FMAC operation is also of considerable interest. It is desirable that they run as fast as possible. A slight reduction in the time required to accomplish an operation can sometimes produce in a computationally intensive process an overall savings in time that is very significant. It is common for an FMAC to incorporate parallel paths of data manipulation for different components of the input numerical values. For example, floating point numbers (for which there are several standard formats) have significands (sometimes also called mantissas) and exponents. When two floating point numbers are to be added, their significands must first be aligned if they have unequal exponents. This is sometimes called xe2x80x9cde-normalizationxe2x80x9d. So, in the case of an FMAC the significand bits of C will be shifted by an amount related to difference between its exponent and the exponent that results from multiplying A and B. That difference can be found while the multiplication is proceeding, so that the shifting of C can be completed as soon as possible, and hopefully before the completion of the multiply operation.
An FMAC must also be able to compute with signed numbers. In one conventional FMAC the significands for A, B and C are converted from the sign magnitude format of IEEE 754 to two""s complement, and the multiplier itself produces a correctly signed two""s complement result for use in the ensuing accumulation. The C input value is shifted as needed to produce bit alignment of the significands, but is itself never complemented. The result of the accumulation is again complemented or not, in view of the signs of the operands and whether the accumulation is addition or subtraction. It is not so much that this does not work; it does. But it increases the execution time of the multiplier if it is to provide a correctly signed (potentially complemented) output; forming and selecting such a two""s complement is of necessity a separate serial step in the sequence of operations within the FMAC, having its own price expressible in gate delays. In connection with this topic it may be useful to refer to U.S. Pat. No. 5,677,863 entitled METHOD OF PERFORMING OPERAND INCREMENT IN BOOTH RECODED MULTIPLY ARRAY, issued on Oct. 14, 1997 to Samuel D. Naffziger and assigned to Hewlett-Packard Co.) It would be desirable if the expense of that extra time could be avoided. What to do?
An FMAC can be made to run faster if the multiplier is allowed to assume that it""s A and B inputs are always positive, so that it never has to provide a complemented output, and if the C input for the accumulation with the product is also assumed to be positive. In connection with this, the sign magnitude notation of the IEEE 754 format of the mantissas (significands) is temporarily exchanged for two""s complement notation of the assumed positive values, which is to say that they are expressed as ordinary binary numbers in a field large enough to be two""s complemented, if needed. Since IEEE 754 is already binary for that part of the number, no extensive xe2x80x9cconversionxe2x80x9d is required. To be sure, notice is taken of the actual signs, and when there is a difference to be formed, either because of addition between numbers having opposite signs, or because of a subtraction between numbers having the same sign, one of the numbers need to be negated (complemented) prior to the addition of C and the product AB. That number can always be C, provided that correct compensatory negation is available after the addition. Such negations are accomplished by performing a two""s complement. The plan is to take advantage of the ability to recognize early on that a two""s complement for C is needed and select a one""s complement of the (shifted) C input, instead of doing a two""s complement to the product. Each of the complemented and the non-complemented (shifted) C values are readily available. Their production and the selection of one or the other are operations that overlap the execution of the multiply, and preferably are done in a way that does not increase the path delay through the shifter. Whenever the one""s complement of the shifted C is selected, a carry-in is also applied to the subsequent adder that forms the accumulation, and thus completes the required two""s complement operation. The accumulated result will typically need to be normalized (shifted), after which it may need a final complement operation to adjust its sign, in accordance with the original signs and whether the accumulation was addition or subtraction. The result may be converted back to the IEEE 754 format in due course.