1. Field of the Invention
This invention relates to floating point arithmetic within microprocessors, and more particularly to an add/subtract pipeline within a floating point arithmetic unit.
2. Description of the Related Art
Numbers may be represented within computer systems in a variety of ways. In an integer format, for example, a 32-bit register may store numbers ranging from 0 to 2.sup.32 -1. (The same size register may also store signed numbers by giving up one order of magnitude in range). This format is limiting, however, since it is incapable of representing numbers which are not integers (the binary point in integer format may be thought of as being to the right of the least significant bit in the register).
To accommodate non-integer numbers, a fixed point representation may be used. In this form of representation, the binary point is considered to be somewhere other than to the right of the least significant bit. For example, a 32-bit register may be used to store values from 0 (inclusive) to 2 (exclusive) by processing register values as though the binary point is located to the right of the most significant register bit. Such a representation allows (in this example) 31 register bits to represent fractional values. In another embodiment, one bit may be used as a sign bit so that a register can store values between -2 and +2.
Because the binary point is fixed within a register or storage location during fixed point arithmetic operations, numbers with differing orders of magnitude may not be represented with equal precision without scaling. For example, it is not possible to represent both 1001b (13 in decimal) and 0.1101 (0.8125 in decimal) using the same fixed point representation. While fixed point representation schemes are still quite useful, many applications require a larger dynamic range (the ratio of the largest number representation to the smallest, non-zero, number representation in a given format).
In order to solve this problem of dynamic range, floating point representation and arithmetic is widely used. Generally speaking, floating point numeric representations include three parts: a sign bit, an unsigned fractional number, and an exponent value. The most widespread floating point format in use today, IEEE standard 754 (single precision), is depicted in FIG. 1.
Turning now to FIG. 1, floating point format 2 is shown. Format 2 includes a sign bit 4 (denoted as S), an exponent portion 6 (E), and a mantissa portion 8 (F). Floating point values represented in this format have a value V, where V is given by: EQU V=(-1).sup.S .multidot.2.sup.E-bias .multidot.(1.F). (1)
Sign bit S represents the sign of the entire number, while mantissa portion F is a 23-bit number with an implied leading 1 bit (values with a leading one bit are said to be "normalized"). In other embodiments, the leading one bit may be explicit. Exponent portion E is an 8-bit value which represents the true exponent of the number V offset by a predetermined bias. A bias is used so that both positive and negative true exponents of floating point numbers may be easily compared. The number 127 is used as the bias in IEEE standard 754. Format 2 may thus accommodate numbers having exponents from -127 to +128. Floating point format 2 advantageously allows 24 bits of representation within each of these orders of magnitude.
Floating point addition is an extremely common operation in numerically-intensive applications. (Floating point subtraction is accomplished by inverting one of the inputs and performing addition). Although floating point addition is related to fixed point addition, two differences cause complications. First, an exponent value of the result must be determined from the input operands. Secondly, rounding must be performed. The IEEE standard specifies that the result of an operation should be the same as if the result were computed exactly, and then rounded (to a predetermined number of digits) using the current rounding mode. IEEE standard 754 specifies four rounding modes: round to nearest, round to zero, round to +.infin., and round to -.infin.. The default mode, round to nearest, chooses the even number in the event of a tie.
Turning now to FIG. 2, a prior art floating point addition pipeline 10 is depicted. All steps in pipeline 10 are not performed for all possible additions. (That is, some steps are optional for various cases of inputs). The stages of pipeline 10 are described below with reference to input values A and B. Input value A has a sign bit A.sub.S, an exponent value A.sub.E, and a mantissa value A.sub.F. Input value B, similarly, has a sign bit B.sub.S, exponent value B.sub.E, and mantissa value B.sub.F.
Pipeline 10 first includes a stage 12, in which an exponent difference E.sub.diff is calculated between A.sub.E and B.sub.E. In one embodiment, if E.sub.diff is calculated to be negative, operands A and B are swapped such that A is now the larger operand. In the embodiment shown in FIG. 2, the operands are swapped such that E.sub.diff is always positive.
In stage 14, operands A and B are aligned. This is accomplished by shifting operand B E.sub.diff bits to the right. In this manner, the mantissa portions of both operands are scaled to the same order of magnitude. If A.sub.E =B.sub.E, no shifting is performed; consequently, no rounding is needed. If E.sub.diff &gt;0, however, information must be maintained with respect to the bits which are shifted rightward (and are thus no longer representable within the predetermined number of bits). In order to perform IEEE rounding, information is maintained relative to 3 bits: the guard bit (G), the round bit (R), and the sticky bit (S). The guard bit is one bit less significant than the least significant bit (L) of the shifted value, while the round bit is one bit less significant the guard bit. The sticky bit is the logical-OR of all bits less significant than R. For certain cases of addition, only the G and S bits are needed.
In stage 16, the shifted version of operand B is inverted, if needed, to perform subtraction. In some embodiments, the signs of the input operands and the desired operation (either add or subtract) are examined in order to determine whether effective addition or effective subtraction is occurring. In one embodiment, effective addition is given by the equation: EQU EA=A.sub.S .sym.B.sub.S .sym.op, (2)
where op is 0 for addition and 1 for subtraction. For example, the operation A minus B, where B is negative, is equivalent to A plus B (ignoring the sign bit of B). Therefore, effective addition is performed. The inversion in stage 16 may be either of the one's complement or two's complement variety.
In stage 18, the addition of operand A and operand B is performed. As described above, operand B may be shifted and may be inverted as needed. Next, in stage 20, the result of stage 18 may be recomplemented, meaning that the value is returned to sign-magnitude form (as opposed to one's or two's complement form).
Subsequently, in stage 22, the result of stage 20 is normalized. This includes left-shifting the result of stage 20 until the most significant bit is a 1. The bits which are shifted in are calculated according to the values of G, R, and S. In stage 24, the normalized value is rounded according to nearest rounding mode. If S includes the R bit OR'ed in, round to nearest (even) is given by the equation: EQU RTN=G(L+S). (3)
If the rounding performed in stage 24 produces an overflow, the result is post-normalized (right-shifted) in stage 26.
As can be seen from the description of pipeline 10, floating point addition is quite complicated. This operation is quite time-consuming, also, if performed as shown in FIG. 2: stage 14 (alignment) requires a shift, stage 18 requires a full add, stage 20 (recomplementation) requires a full add, stage 22 requires a shift, and stage 24 (rounding) requires a full add. Consequently, performing floating point addition using pipeline 10 would cause add/subtract operations to have a similar latency to floating point multiplication. Because of the frequency of floating point addition, higher performance is typically desired. Accordingly, most actual floating point add pipeline include optimizations to pipeline 10.
Turning now to FIG. 3, a prior art floating point pipeline 30 is depicted which is optimized with respect to pipeline 10. Broadly speaking, pipeline 30 includes two paths which operate concurrently, far path 31A and close path 31B. Far path 31A is configured to perform all effective additions. Far path 31A is additionally configured to perform effective subtractions for which E.sub.diff &gt;1. Close path 31B, conversely is configured to perform effective subtractions for which E.sub.diff .ltoreq.1. As with FIG. 2, the operation of pipeline 30 is described with respect to input values A and B.
Pipeline 30 first includes stage 32, in which operands A and B are received. The operands are conveyed to both far path 31A and close path 31B. Results are then computed for both paths, with the final result selected in accordance with the actual exponent difference. The operation of far path 31A is described first.
In stage 34 of far path 31A, exponent difference E.sub.diff is computed for operands A and B. In one embodiment, the operands are swapped if A.sub.E &gt;B.sub.E. If E.sub.diff is computed to be 0 or 1, execution in far path 31A is cancelled, since this case is handled by close path 31B as will be described below. Next, in stage 36, the input values are aligned by right shifting operand B as needed. In stage 38, operand B is conditionally inverted in the case of effective subtraction (operand B is not inverted in the case of effective addition). Subsequently, in stage 40, the actual addition is performed. Because of the restrictions placed on far path (E.sub.diff &gt;1), the result of stage 40 is always positive. Thus, no recomplementation step is needed. The result of stage 40 is instead rounded and post-normalized in stages 42 and 44, respectively. The result of far path 31A is then conveyed to stage 58.
In stage 46 of close path 31B, exponent difference E.sub.diff is calculated in stage 46. If E.sub.diff is computed to less than equal to 1, execution continues in close path 31B with stage 48. In one embodiment, operands A and B are swapped (as in one embodiment of far path 31A) so that A.sub.E .gtoreq.B.sub.E. In stage 48, operand B is inverted to set up the subtraction which is performed in stage 50. In one embodiment, the smaller operand is also shifted by at most one bit. Since the possible shift amount is low, however, this operation may be accomplished with greatly reduced hardware.
The output of stage 50 is then recomplemented if needed in stage 52, and then normalized in stage 54. This result is rounded in stage 56, with the rounded result conveyed to stage 58. In stage 58, either the far path or close path result is selected according to the value of E.sub.diff.
It is noted that in close path 31B, stage 52 (recomplementation) and stage 56 (rounding) are mutually exclusive. A negative result may only be obtained in close path 31B in the case where A.sub.E =B.sub.E and A.sub.F &lt;B.sub.F. In such a case, however, no bits of precision are lost, and hence no rounding is performed. Conversely, when shifting occurs (giving rise to the possibility of rounding), the result of stage 50 is always positive, eliminating the need for recomplementation in stage 52.
The configuration of pipeline 30 allows each path 31 to exclude unneeded hardware. For example, far path 31A does not require an additional adder for recomplementation as described above. Close path 31B eliminates the need for a full shift operation before stage 50, and also reduces the number of add operations required (due to the exclusivity of rounding and recomplementation described above).
Pipeline 30 offers improved performance over pipeline 10. Because of the frequency of floating point add/subtract operations, however, a floating point addition pipeline is desired which exhibits improved performance over pipeline 30. Improved performance is particularly desired with respect to close path 31B.