This background is presented to provide a meaningful context for the disclosure. This background may include descriptions of problems and subject matter that do not constitute prior art, or which include elements that are not part of the prior art. Therefore, nothing in this background should be regarded as prior art unless it is self-evidently and verifiably prior art.
Floating point multiply accumulate (FMA) logic is a critical component of modern computer processors. In an FMA operation, execution logic performs an operation representable by ±A*±B±C, where A, B and C are each floating point numbers representing a multiplier, a multiplicand, and an accumulator, respectively.
One of the goals of FMA design is to reduce critical timing paths. This has led some FMA designers to analyze different categories of FMA calculations and design logic optimized for certain types of FMA calculations. For example, US 2012/072703 to Srinivasan describes a monolithic multiply accumulate unit with near and far datapaths for accumulating the accumulator operand. The near path handles cases in which a difference between the product of the multiplier exponents and the accumulator exponent—hereinafter referred to as ExpDelta—is within a threshold range (−2, −1, 0, 1). The far path handles the other cases.
The '817 application describes logic in which the accumulator is accumulated along with the partial products for a far larger set of operand inputs than Srinivasan. In a nutshell, the accumulator can be injected into the partial product adder if the accumulator magnitude is small enough, relative to the product magnitude, that it does not require an exponent-aligned left-shift beyond what the datapath can accommodate. In a case in which the partial product adder datapath is equal to 1 plus 2 times the significand width, for example, the accumulator is accumulated with the partial products in cases where ExpDelta ≥−1. If the operation would result in an effective subtraction, the accumulation is also done in the partial product adder for cases in which ExpDelta=−2. But for other cases, the accumulator is accumulated separately, after a normalized, nonredundant sum has been generated of the partial products. As used herein, ExpDelta refers to the sum of the multiplicand and multiplier exponents minus the accumulator exponent.
With both of these references, the path or accumulation stage is determined based on the operand exponent values. But with denormal operands, the true exponent value (i.e., the value that the exponent would be if the denormal value were normalized to an infinite precision exponent) is not immediately known. Consequently, an ExpDelta calculation on one more denormal operands may not accurately reflect the potential that an accumulator operand can be aligned within the partial product adder. Moreover, denormal inputs can create complications in the design of a split-unit FMA, because we don't know the true magnitude of the denormal accumulator. It may, with respect to an underflow product, be smaller or larger than the underflow product. This introduces challenges in aligning it properly for accumulation.
The typical prior art response to the problem of denormal inputs is to prenormalize the inputs. This is accomplished by having the FMA logic count the number of leading zeroes for each operand on the front end of an FMA unit. Unfortunately, this initial leading zero determination becomes a part of the critical path, slowing down the execution speed of the FMA logic. An additional detriment is that prenormalization can contribute latency to the instruction. Some designs consume several processor cycles to accommodate prenormalization. This can create situations in which an instruction needs to be replayed, and any instruction dependent on it has to be delayed.