The present inventive concepts relate to digital circuits, and more particularly, to a fast close path solution for a three-path fused multiply-adder circuit.
Floating-point circuits are designed to perform various mathematical operations on floating-point numbers. Specialized hardware can be used to enhance the speed of such circuits and for implementing certain floating-point functions. For example, a fused multiply-adder circuit can be implemented within a floating-point circuit to perform multiply-accumulate functions that are commonly used in digital signal processing operations.
At a high level, a fused multiply-adder circuit basically combines a multiplication operation with an add operation to perform a single instruction execution of the equation (A×B)+C. Within a fused multiply-adder circuit, a multiplicand and a multiplier are initially multiplied via a partial product generation module. The partial products are then added by a partial product reduction module that reduces the partial products to a sum and a carry in their redundant form. The redundant sum and carry are further added to an addend via a carry-save adder to form a second redundant sum and a second redundant carry. The second redundant sum and the second redundant carry are subsequently added within a carry-propagate adder to yield a sum total.
Since the early 1990s, a plethora of algorithms that utilize the (A×B)+C single-instruction equation have been introduced for applications in digital signal processing and graphics processing. To complement the ever increasing usage of the fused multiply-add instruction, the floating-point adder (FPA) and floating-point multiplier (FPM) of some chips are entirely replaced with a fused multiply-adder by using constants, such as (A×B)+0.0 for single multiplies and (A×1.0)+C for single adds. The combination of industrial implementation and increasing algorithmic activities has prompted the IEEE 754R committee to consider the inclusion of the fused multiply-add instruction into the IEEE standard for floating-point arithmetic.
However, conventional fused multiply-adder circuits include a critical close path having a serial event chain that flows from a leading zero anticipator (LZA) stage, to a priority encoder (PENC) stage, to a normalizing shift stage, and finally to a full add/round stage. Consequently, in conventional approaches, the close path has excessive logic depth and therefore high latency. Embodiments of the present inventive concept address these and other limitations in the prior art.