Processing units such as central processing units (CPUs) and graphics processing units (GPUs) are designed to perform arithmetic operations that conform to a specified numeric representation. One common numeric representation is a floating-point number, which typically includes a mantissa field, an exponent field, and a sign field. For example, a floating-point number format specified by the institute of electrical and electronics engineers (IEEE) is thirty-two bits in size and includes twenty-three mantissa bits, eight exponent bits, and one sign bit. Other standard floating-point numbers are defined that are up to one-hundred and twenty-eight bits in size. Floating-point arithmetic circuits configured to implement arithmetic operations on floating-point numbers must properly process one or more input floating-point numbers and generate an arithmetically correct floating-point result.
Conventional IEEE format floating-point multiply/add units include a large multiplier followed by a wide adder which receives a product generated by the multiplier and a shifted version of an addend that are combined to produce the value of a*b+c that is then normalized. This value is then inspected and renormalized to return a value conforming to the IEEE floating-point format specification. To conform to the IEEE standard, an implementation of a floating-point multiply/add unit maintains complete internal precision between the multiplier and multiplicand (a and b) used to generate the product and the c addend through the computation to the output. Maintaining the internal precision necessitates a large logic circuit that expends both static and dynamic power.
Thus, there is a need for reducing the amount of power consumed by floating-point arithmetic circuits and/or addressing other issues associated with the prior art.