1. Field of the Invention
This invention relates to floating point arithmetic within microprocessors, and more particularly to an add/subtract pipeline within a floating point arithmetic unit.
2. Description of the Related Art
Numbers may be represented within computer systems in a variety of ways. In an integer format, for example, a 32-bit register may store numbers ranging from 0 to 232 xe2x88x921. (The same register may also signed numbers by giving up one order of magnitude in range). This format is limiting, however, since it is incapable of representing numbers which are not integers (the binary point in integer format may be thought of as being to the right of the least significant bit in the register).
To accommodate non-integer numbers, a fixed point representation may be used. In this form of representation, the binary point is considered to be somewhere other than to the right of the least significant bit. For example, a 32-bit register may be used to store values from 0 (inclusive) to 2 (exclusive) by processing register values as though the binary point is located to the right of the most significant register bit. Such a representation allows (in this example) 31 registers bit to represent fractional values. In another embodiment, one bit may be used as a sign bit so that a register can store values between xe2x88x922 and +2.
Because the binary point is fixed within a register or storage location during fixed point arithmetic operations, numbers with differing orders of magnitude may not be represented with equal precision without scaling. For example, it is not possible to represent both 1101b (13 in decimal) and 0.1101 (0.8125 in decimal) using the same fixed point representation. While fixed point representation schemes are still quite useful, many applications require a large dynamic range (the ratio of the largest number representation to the smallest, non-zero, number representation in a given format).
In order to solve this problem of dynamic range, floating point representation and arithmetic is widely used. Generally speaking, floating point numeric representations include three parts: a sign bit, an unsigned fractional number, and an exponent value. The most widespread floating point format in use today, IEEE standard 754 (single precision), is depicted in FIG. 1.
Turning now to FIG. 1, floating point format 2 is shown. Format 2 includes a sign bit 4 (denoted as S), an exponent portion 6 (E), and a mantissa portion 8 (F). Floating point values represented in this format have a value V, where V is given by:
V=(xe2x88x921)Sxc2x72Exe2x88x92biasxc2x7(1.F). xe2x80x83xe2x80x83(1) 
Sign bit S represents the sign of the entire number, while mantissa portion F is a 23-bit number with an implied leading 1 bit (values with a leading one bit are said to be xe2x80x9cnormalizedxe2x80x9d). In other embodiments, the leading one bit may be explicit. Exponent portion E is an 8-bit value which represents the true exponent of the number V offset by a predetermined bias. A bias is used so that both positive and negative true exponents of floating point numbers may be easily compared. The number 127 is used as the bias in IEEE standard 754. Format 2 may thus accommodate numbers having exponents from xe2x88x92127 to +128. Floating point format 2 advantageously allows 24 bits of representation within each of these orders of magnitude.
Floating point addition is an extremely common operation in numerically-intensive applications. (Floating point subtraction is accomplished by inverting one of the inputs and performing addition). Although floating point addition is related to fixed point addition, two differences cause complications. First, an exponent value of the result must be determined from the input operands. Secondly, rounding must be performed. The IEEE standard specifies that the result of an operation should be the same as if the result was computed exactly, and then rounded (to a predetermined number of digits) using the current rounding mode. IEEE standard 754 specifies four rounding modes: round to nearest, round to zero, round to +∞, and round to xe2x88x92∞. The default mode, round to nearest, chooses the even number in the event of a tie.
Turning now to FIG. 2, a prior art floating point addition pipeline 10 is depicted. All steps in pipeline 10 are not performed for all possible additions. (That is, some steps are optional for various cases of inputs). The stages of pipeline 10 are described below with reference to input values A and B. Input value A has a sign bit AS, an exponent value AE, and a mantissa value AF. Input value B, similarly, has a sign bit BS, exponent value BE, and mantissa value BF.
Pipeline 10 first includes a stage 12, in which an exponent difference Ediff is calculated between AE and BE. In one embodiment, if Ediff is calculated to be negative, operands A and B are swapped such that A is now the larger operand. In the embodiment shown in FIG. 2, the operands are swapped such that Ediff is always positive.
In stage 14, operands A and B are aligned. This is accomplished by shifting operand B Ediff bits to the right. In this manner, the mantissa portions of both operands are scaled to the same order of magnitude. If AE=BE, no shifting is performed; consequently, no rounding is needed. If Ediff greater than 0, however, information must be maintained with respect to the bits which are shifted rightward (and are thus no longer representable within the predetermined number of bits). In order to perform IEEE rounding, information is maintained relative to 3 bits: the guard bit (G), the round bit (R), and the stick bit (S). The guard bit is one bit less significant than the least significant bit (L) of the shifted value, while the round bit is one bit less significant the guard bit. The sticky bit is the logical-OR of all bits less significant than R. For certain cases of addition, only the G and S bits are needed.
In stage 16, the shifted version of operand B is inverted, if needed, to perform subtraction. In some embodiments, the signs of the input operands and the desired operation (either add or subtract) are examined in order to determine whether effective addition or effective subtraction is occurring. In one embodiment, effective addition is given by the equation:
EA=AS⊕BS⊕op, xe2x80x83xe2x80x83(2) 
where op is 0 for addition and 1 for subtraction. For example, the operation A minus B, where B is negative, is equivalent to A plus B (ignoring the sign bit of B). Therefore, effective addition is performed. The inversion in stage 16 may be either of the one""s complement or two""s complement variety.
In stage 18, the addition of operand A and operand B is performed. As described above, operand B may be shifted and may be inverted as needed. Next, in stage 20, the result of stage 18 may be recomplemented, meaning that the value is returned to sign-magnitude form (as opposed to one""s or two""s complement form).
Subsequently, in stage 22, the result of stage 20 is normalized. This includes left-shifting the result of stage 20 until the most significant bit is a 1. The bits which are shifted in are calculated according to the values of G, R, and S. In stage 24, the normalized value is rounded according to nearest rounding modes. If S includes the R bit OR""ed in, round to nearest (even) is given by the equation:
RTN=G(L+S). xe2x80x83xe2x80x83(3) 
If the rounding performed in stage 24 produces an overflow, the result is post-normalized (right-shifted) in stage 26.
As can be seen from the description of pipeline 10, floating point addition is quite complicated. This operation is quite time-consuming, also, if performed as shown in FIG. 2: stage 14 (alignment) requires a shift, stage 18 requires a full add, stage 20 (recomplementation) requires a full add, stage 22 requires a shift, and stage 24 (rounding) requires a full add. Consequently, performing floating point addition using pipeline 10 would cause add/subtract operations to have a similar latency to floating point multiplication. Because of the frequency of floating point addition, higher performance is typically desired. Accordingly, most actual floating point add pipeline include optimizations to pipeline 10.
Turning now to FIG. 3, a prior art floating point pipeline 30 is depicted which is optimized with respect to pipeline 10. Broadly speaking, pipeline 30 includes two paths which operate concurrently, far path 31A and close path 31B. Far path 31A is configured to perform all effective additions. Far path 31A is additionally configured to perform effective subtractions for which Ediff greater than 1. Close path 31B, conversely is configured to perform effective subtractions for which Ediffxe2x89xa61. As with FIG. 2, the operation of pipeline 30 is described with respect to input values A and B.
Pipeline 30 first includes stage 32, in which operands A and B are received. The operands are conveyed to both far path 31A and close path 31B. Results are then computed for both paths, with the final result selected in accordance with the actual exponent difference. The operation of far path 31A is described first.
In stage 34 of far path 31A, exponent difference Ediff is computed for operands A and B. In one embodiment, the operands are swapped if AE greater than BE. If Ediff is computed to be 0 or 1, execution in far path 31A is cancelled, since this case is handled by close path 31B as will be described below. Next, in stage 36, the input values are aligned by right shifting operand B as needed. In stage 38, operand B is conditionally inverted in the case of effective subtraction (operand B is not inverted in the case of effective addition). Subsequently, in stage 40, the actual addition is performed. Because of the restrictions placed on far path (Ediff greater than 1), the result of stage 40 is always positive. Thus, no recomplementation step is needed. The result of stage 40 is instead rounded and post-normalized in stages 42 and 44, respectively. The result of far path 31A is then conveyed to stage 58.
In stage 46 of close path 31B, exponent difference Ediff is calculated in stage 46. If Ediff is computed to less than equal to 1, execution continues in close path 31B with stage 48. In one embodiment, operands A and B are swapped (as in one embodiment of far path 31A) so that AExe2x89xa7BE. In stage 48, operand B is inverted to set up the subtraction which is performed in stage 50. In one embodiment, the smaller operand is also shifted by at most one bit. Since the possible shift amount is low, however, this operation may be accomplished with greatly reduced hardware.
The output of stage 50 is then recomplemented if needed in stage 52, and then normalized in stage 54. This result is rounded in stage 56, with the rounded result conveyed to stage 58. In stage 58, either the far path or close path result is selected according to the value of Ediff.
It is noted that in close path 31B, stage 52 (recomplementation) and stage 56 (rounding) are mutually exclusive. A negative result may only be obtained in close path 31B in the case where AE=BE and AF less than BF. In such a case, however, no bits of precision are lost, and hence no rounding is performed. Conversely, when shifting occurs (giving rise to the possibility of rounding), the result of stage 50 is always positive, eliminating the need for recomplementation in stage 52.
The configuration of pipeline 30 allows each path 31 to exclude unneeded hardware. For example, far path 31A does not require an additional adder for recomplementation as described above. Close path 31B eliminates the need for a full shift operation before stage 50, and also reduces the number of add operations required (due to exclusivity of rounding and recomplementation described above).
Pipeline 30 offers improved performance over pipeline 10. Because of the frequency of floating point add/subtract operations, however, a floating point addition pipeline is desired which exhibits improved performance over pipeline 30. Improved performance is particularly desired with respect to close path 31B.
The problems outlined above are in large part solved by an execution unit in accordance with the present invention. In one embodiment, an execution unit is provided which is usable to perform effective addition or subtraction upon a given pair of floating point input values. The execution unit includes an add/subtract pipeline having a far data path and a close data path each coupled to receive the given pair of floating point input values. The far data path is configured to perform effective addition as well as effective subtraction upon operands having an absolute exponent difference greater than one. The close data path, on the other hand, is configured to perform effective subtraction upon operands having an absolute exponent difference less than or equal to one. The add/subtract pipeline further includes a result multiplexer unit coupled to receive a result from both the far data path and the close data path. A final output of the result multiplexer unit is selected from the far path result and the close path result according to the actual calculated absolute exponent difference value.
In one embodiment, the far data path includes a pair of right shift units coupled to receive mantissa portions of each of the given pair of floating point input values. The right shift units each receive a shift amount from a corresponding exponent difference unit. The first right shift unit conveys a shift amount equal to the second exponent value minus the first exponent value, while the second right shift unit conveys a shift amount equal to the first exponent value minus the second exponent value. The outputs of the right shift units are then conveyed to a multiplexer-inverter unit, which also receives unshifted versions of the mantissa portions of each of the given pair of floating point input values. The multiplexer-inverter unit is configured to select one of the unshifted mantissa portions and one of the shift mantissa portions to be conveyed as inputs to an adder unit. The adder inputs conveyed by the multiplexer-inverter unit are aligned in order to facilitate the addition operation. The multiplexer-inverter unit is further configured to invert the second adder input if the effective operation to be performed is subtraction.
The adder unit is configured to add the first and second adder inputs, thereby generating first and second adder outputs. The first adder output is equal to the sum of the two inputs, while the second adder output is equal to the first adder output plus one. One of the two adder outputs is selected according to a far path selection signal generated by a far path selection unit. The far path selection unit is configured to generate a plurality of preliminary far path selection signals. Each of these preliminary far path selection signals corresponds to a different possible normalization of the first adder output. For example, one of the preliminary far path selection signals corresponds to a prediction that the first adder output is properly normalized. Another preliminary far path selection signal corresponds to a prediction that the first adder output is not normalized, while still another select signal indicates that said first adder output has an overflow bit set. One of these preliminary far path selection signals is selected to be conveyed as the final far path selection signal based on which of these predictions actually occurs.
The far data path further includes a multiplexer-shift unit configured to receive the first and second adder outputs as well as the final far path selection signal. The appropriate adder output is selected, and a one-bit left or right shift may also be performed to properly normalize the result. In the case of a left shift, a guard bit previously shifted out of one of the mantissa values by a right shift unit may be shifted back into the final result. The selected value is conveyed as a mantissa portion of the far data path result value. The exponent portion of the far path result is calculated by a exponent adjustment unit. The exponent adjustment unit is configured to receive the original larger exponent value along with the amount of shifting required for proper normalization (which may be not shift, a one-bit left shift, or a one-bit right shift).
In contrast to a generic floating point addition/subtraction pipeline, the far data path is optimized to perform effective additions. The far data path is additionally optimized to perform effective subtractions on operands having an absolute exponent difference greater than one. This configuration allows the recomplementation step to be avoided, since all operations produce positive results. Furthermore, since adder outputs require at most a one-bit shift, only one full-size shifter is needed in the far data path. This results in improved floating point addition and subtraction performance for the far data path.
In one embodiment, the close data path is coupled to receive mantissa portions of the given pair of floating point input values, as well as two least significant bits of each of the exponent values. The mantissa values are conveyed to a shift-swap unit, which also receives an exponent difference prediction from an exponent prediction unit. The exponent difference prediction is indicative of whether the absolute exponent difference is 0 or 1. It is used to align and swap (if needed) the input mantissa values for conveyance to a close path adder unit. The mantissa values are swapped such that the exponent value associated with the first adder input is greater than or equal to the exponent value associated with the second adder input. The first adder input is not guaranteed to be greater than the second adder input if the exponent values are equal, however. The shift-swap unit is also configured to invert the second adder input since the adder unit within the close data path performs subtraction.
It is further noted that the exponent difference value generated by the exponent prediction unit may be incorrect. This is true since the exponent prediction is based only on a subset of the total number of bits. The result produced by the close data path is thus speculative. The actual exponent difference calculated in the far data path is used to determine whether the result produced by the close data path is valid.
The adder unit within the close data path produces a first and second output value. The first output value is equal to the first adder input plus the second adder input, which is effectively equivalent to the first mantissa portion minus the second mantissa portion. The second output value, on the other hand, is equal to the first output value plus one. Both values are conveyed to a multiplexer-inverter unit. A close path selection signal provided by a close path selection unit is usable to select either the first adder output or the second adder output as a preliminary close path result.
The selection unit includes a plurality of logic sub-blocks, each of which is configured to generate a preliminary close path selection signal indicative of either the first adder output value or the second adder output value. Each of the preliminary close path selection signals corresponds to a different prediction scenario. For example, a first logic sub-block generates a preliminary close path select signal for the case in which the exponent values are equal and the first mantissa value is greater than the second mantissa value. A second logic sub-block generates a select signal for the case in which the exponent values are equal and the first mantissa value is less than the second mantissa value. A third logic sub-block corresponds to the case in which the first exponent value is greater than the second exponent value and the first adder output is not normalized. The last sub-block corresponds to the case in which the first exponent value is greater than the second exponent value and the first adder output is normalized. Each of the preliminary selection signals is conveyed to a close path selection multiplexer, the output of which is used to select either the first or second adder output as the preliminary close path subtraction result.
The output for the close path selection multiplexer is determined by which of the various predicted cases actually occurs. Accordingly, the close path selection multiplexer receives as control signals the exponent prediction value (indicating whether the exponents are equal or not), the sign value of the first adder output (indicating whether a negative result is present), and the MSB of the first adder output (indicating whether the result is properly normalized or not). The sign value and the MSB value are generated concurrently within both the adder unit and the selection unit. This is accomplished using a carry chain driven by CMSB, the carry in signal to the most significant bit position of the adder unit. This concurrent generation allows faster selection of either the first or second adder outputs. The selection of one of these values effectuates rounding the close path result to the nearest number (an even number is chosen in the event of a tie). This configuration advantageously eliminates the need for a separate adder unit to perform rounding.
If the first adder output is negative, the multiplexer-inverter unit inverts the first adder output to produce the correct result. This occurs for the case in which the exponents are equal and the second mantissa value is greater than the first mantissa value. In any event, the selected close path preliminary subtraction result is then conveyed to a left shift unit for normalization.
The close path preliminary subtraction result conveyed to the left shift unit is shifted according to a predicted shift amount generated by a shift prediction unit. The shift prediction unit includes three leading 0/1 detection unit. The first unit, a leading 1 detection unit, generates a first prediction string for the case in which the first exponent value is greater than the second exponent value. The second unit, which performs both leading 0 and 1 detection, generates a second prediction string for the case in which the first and second exponent values are equal. Leading 0 and 1 detection is performed because the result may be positive (leading 1) or negative (leading 0). Finally, the third unit, a leading 1 detection unit, generates a third prediction string for the case in which the second exponent value is greater than the first exponent value. The most significant asserted bits within each of the strings indicates the position of a leading 0 or 1 value.
Each of the three prediction strings are generated concurrently and conveyed to a shift prediction multiplexer. The exponent prediction value generated by the exponent prediction unit within the close data path selects which of the prediction strings is conveyed by the shift prediction multiplexer to a priority encoder. The priority encoder then converts the selected prediction string to a shift amount which is conveyed to the left shift unit within the close data path. The predicted shift amount may in some instances be incorrect by one bit position. For such cases, the close path result is left shifted one place during final selection. The calculated results of both the far data path and close data path are conveyed to a final result multiplexer, which selects the correct result based upon the calculated actual exponent difference value.
Within the shift prediction unit, the second leading 0/1 detection unit may not be optimized further, since no assumptions may be made regarding its inputs. The first and third prediction units, however, may be optimized, since it is known that the second mantissa to each unit is inverted and shifted one bit rightward with respect to the first mantissa. This means that the results predicted by the first and third detection units are both positive. Hence, only lead 1 detection is desired. Further optimizations may also be made since it is known that subtraction is being performed.
Prediction strings may be formed by assigning a value to each output bit based on the corresponding inputs for that bit position. In standard T-G-Z notation, an T output value represents input values 10 or 01, a G output value represents input values 11, and a Z output value represents output values 00. A leading 1 may thus be detected whenever the pattern T*GZ* stops matching in the generated prediction string.
The two leading 1 detection units within the shit prediction unit of the close data path may optimized over prior art designs by recognizing that the MSB of both input operands is 1. (The MSB of the first operand is a 1 since it is normalized, and the MSB of the second operand is also a 1 since the second adder operand is right shifted one place then inverted). This corresponds to an output value of G in the MSB of the prediction string. With a G in the initial position of the prediction string, it may be recognized that the string stops matching whenever Zxe2x80x2 (the complement of Z) is found. This condition is realized whenever at least one of the inputs in a given bit position is set.
The optimized leading 1 detection unit includes a pair of input registers and an output register for storing the generated prediction string. The first input register is coupled to receive the first (greater) mantissa value, while the second input register is coupled to receive an inverted version of the second (lesser) mantissa value. The leading 1 detection unit further includes a plurality of logic gates coupled to receive bits from each of the input registers. Each logic gate generates a bit for the final prediction string based on whether one of the inputs is set. The most significant asserted bit in the output prediction string indicates the position of the leading 1 bit.
The add/subtract pipeline may also be configured to perform floating point-to-integer and integer-to-floating point conversions. In one embodiment, the far data path may be used to perform floating point-to-integer conversions, while the close data path performs integer-to-floating point conversions. Both data paths are configured to be as wide as the width of the larger format.
In order to perform floating point-to-integer conversions within the far data path, a shift amount is generated from the maximum integer exponent value and the exponent value of the floating point number to be converted. The floating point mantissa to be converted is then right shifted by the calculated shift amount and conveyed to the multiplexer-inverter unit. The multiplexer-inverter unit conveys the converted mantissa value to the adder unit as the second adder unit. The first adder input is set to zero.
As with standard far path operation, the adder unit produces two output values, sum and sum+1. These values are conveyed to the multiplexer-shift unit, where the first adder output (sum) is selected by the far path selection signal. The far path selection unit is configured to select the sum output of the adder unit in response to receiving an indication that a floating point-to-integer conversion is being performed.
The floating point number being converted may greater than the maximum representable integer (or less than the minimum representable integer). Accordingly, comparisons are performed to determine whether overflow or underflow has occurred. If either condition is present, the integer result is clamped at the maximum or minimum value.
In order to perform integer-to-floating point conversions within the close data path, a zero value is utilized as the first operand, while the second operand is the integer value to be converted. The second operand is inverted (since close path performs subtraction) and conveyed along with the zero value to the adder unit. The adder unit, as in standard close path operations, produces two outputs, sum and sum+1.
If the input integer value is positive, the output of the adder unit is negative. Accordingly, the sum output is chosen by the selection unit as the preliminary close path result. This output is then inverted in the multiplexer-inverter unit to produce the correct result. If, on the other hand, the input integer value is negative, the output of the adder unit is positive. The sum+1 output is thus chose as the preliminary close path result, and the sign of the resulting floating point number is denoted as being negative.
The preliminary close path result is then conveyed to the left shift unit for normalization, which is performed in accordance with a predicted shift amount conveyed from the shift prediction unit. For integer-to-floating point conversion, the prediction string of the second prediction unit (equal exponents) is used. The zero operand and an inverted version of the integer value are conveyed as inputs to the second prediction unit.
The shift amount generated by the shift prediction unit is usable to left align the preliminary close path result (with a possible one-bit correction needed). With alignment performed, the number bits in the floating point mantissa may thus be routed from the output of the left shift unit to form the mantissa portion of the close path result. The exponent portion of the close path result is generated by an exponent adjustment unit.
The exponent adjustment unit is configured to subtract the predicted shift amount from the maximum exponent possible in the integer format. The result (which may also be off by 1) becomes the exponent portion of the close path result. If the dynamic range of the floating point format is greater than the maximum representable integer value, overflows do not occur.
The execution unit may also be configured to include a plurality of add/subtract pipelines each having a far and close data path. In this manner, vectored instructions may be performed which execute the same operations on multiple sets of operands. This is particularly useful for applications such as graphics in which similar operations are performed repeatedly on large sets of data.
In addition to performing vectored add and subtract operations, the execution unit may also be configured to perform vectored floating point-to-integer and integer-to-floating point instructions as described above. The execution unit may still further be configured to perform additional vectored arithmetic operations such as reverse subtract and accumulate functions by appropriate multiplexing of input values to the far and close data paths. Other vectored operations such as extreme value functions and comparison operations may be implemented through appropriate multiplexing of output values.