This invention relates generally to computer systems, and more particularly, to computer systems providing floating-point operations.
The “IEEE-754 Standard for Binary Floating-point Arithmetic” specifies a floating point data architecture that is commonly implemented in computer hardware, such as floating point processors having multipliers. The format consists of a sign, an unsigned biased exponent, and a significand. The sign bit is a single bit and is represented by an “S”. The unsigned biased exponent, represented by a “e,” is 8 bits long for single format and 11 bits long for double format. The significand is 24 bits long for single format and 53 bits long for double format. The most significant bit of the significand is implied from the value of the exponent. The lesser significant bits of the significand or fraction are represented by “F” in equations (1) and (2) that follow. If the unsigned biased exponent “e” is not equal to zero and does not have all bits set to one, then the value of the floating-point number is given by the following equation:(−1)S×(1).F×2(C−Bias)  (1)
Numbers within this range are called normalized numbers and they have an implied one at the beginning of the significand. Numbers outside this range are considered to be special numbers. There are four types of special numbers defined in the IEEE-754 Standard. Three of these special numbers are handled easily by the hardware since their value dictates the resultant value with little or no arithmetic computation. These three special numbers are zero, infinity and not-a-number (“NaN”). The fourth type of special number is a de-normalized number that is indicated by an unsigned biased exponent, e, equal to zero and a non-zero fraction. The value of the fourth special number is given by the following equation:(−1)S×(0).F×2(1−Bias)  (2)
In contrast with the normalized format, there is no implied one preceding the fraction in this de-normalized format. In order to determine that the data is de-normalized, the characteristic must be examined. This is important since the computation that is performed by the hardware is typically serially gated by the predetermination of de-normalized input data that will contribute to the cycle time of the hardware, as is the case of multiplication. The handling of de-normalized input data is a particular problem for floating point processors that do not have any pre-decoded information that an operand is de-normalized, particularly where the assumption is that an input operand is normalized.
One of the key performance factors in designing high performance floating-point units (FPUs) is the number of cycles required to resolve a dependency between two successive operations. For example, an overall latency for a fused multiply-add operation may be seven cycles with a throughput of one operation per cycle per FPU. In this type of pipeline, it is typical that an operation that is dependent on the result of the prior operation will have to wait the whole latency of the first operation before starting (in this case seven cycles).
Currently, some FPUs perform fused multiply-add operations that support limited cases of data dependent operations by delaying the dependent operations until after the rounded intermediate result is calculated. For example, U.S. Pat. No. 4,999,802 to Cocanougher et al., of common assignment herewith, depicts a mechanism for allowing an intermediate result prior to rounding to be transmitted to a new dependent instruction and later corrected in the multiplier. This mechanism supports an intermediate result prior to rounding to be fed back to the multiplier for double precision data.
Improvements in performance could be achieved by providing early un-rounded feed back for multiple data types (i.e. single precision and double precision) and by allowing a dependency in both the multiplier input operands, as well as the addend input operand. Additional performance improvements may be achieved by feeding back an un-rounded un-normalized result prior to some or all of the normalization.