(1) Field of the Invention
The present invention is directed to an implementation of a floating point multiply-add-subtract implementation for digital circuitry.
(2) Description of the Prior Art
In digital computer processing, signed floating point numbers can be utilized in a form having a mantissa multiplied by a base having an exponent. Mathematical functions are carried out on these numbers in semiconductor floating point units or processors in binary format. The floating point unit does addition, subtraction, multiplication, and division operations on floating point numbers. In many implementations the exponent is usually biased which means that a number called the bias is subtracted from the written exponent before computation. This allows implementations to use a positive representation of a negative exponent, since the written exponent minus the bias is negative. The examples assume a normalized format, which means that the first bit of the mantissa is ‘1’.
The Institute of Electrical and Electronics Engineers (IEEE) has standards for floating point representation of numbers. The current standard used by most commercial processors is IEEE-754-2008. The output of this format is a binary floating point number that contains a sign, biased exponent, and mantissa. A 16-bit IEEE-754 floating point number is given by the following format:                seee eemm mmmm mmmmwhere each letter represents a binary digit or bit; s is the sign bit; each e is an exponent bit; and each m is a mantissa bit. In this format the minimum exponent is −14, and the maximum exponent is 15. The exponent bias is 15. This means that 15 is subtracted from the exponent value to give the actual value. An exponent value having all is 1s used to represent infinity or “not a number” known as NaN. An exponent value having all zeroes is used to represent a denormalized number. IEEE-754 32 bit, 64 bit, and 128 bit floating point formats are similar.        
Important resources for floating point unit implementation are its size and its speed. The size of the implementation is the number of gates that are required. Typical commercial 32 bit multiply/accumulate floating point units without division take approximately 12,800 gates. This commercial implementation runs at 1 MFlop/Mhz or 55 Mhz.
When utilizing field programmable gate arrays and other special purpose semiconductors, it is often desirable to reduce the number of gates and chip resources required for processing floating point numbers. It is further desirable to process these numbers as quickly as possible.