1. Field of the Invention
The present invention relates to an apparatus and method for determining, in a pipelined processing unit, when a floating point operation yields either an overflow or underflow (out of range) condition. More particularly, an operation to determine the out of range condition is implemented in parallel with computation of the floating point value.
2. Description of Related Art
Many pipelined processing systems use a "hold writeback" operation as a means of ensuring that a continuous flow of instructions is maintained through the system. One of the major difficulties in these types of pipelined execution units is the timing path associated with holding the last stage of the pipeline. Once the last stage is held (typically the writeback stage), it is necessary to send that hold signal back up to all of the previous pipeline stages, This makes the timing of the "hold writeback" stage extremely critical. Therefore, in order to improve the timing of the "hold writeback" equation, it is necessary to determine which dominant term, or terms are causing the timing problems, The major dominant term(s) affecting the "hold writeback" equation were found to be in the detection of overflow and underflow conditions. These out of range conditions indicate that exponent result (exp.sub.-- result) of the value yielded by the floating point operation is either too large (overflow) or too small (underflow). Generally, there are two separate precision cases that must be considered. The single precision case has an exponent range of -126 to +127, while the double precision case has a range of -1022 to +1023. Thus, there are two types of overflow and underflow situations which must be considered, i.e. underflow single precision and double precision, as well as overflow single and double precision.
The present invention allows the time required to detect underflow and overflow conditions to be minimized. A small additional amount of hardware is added, but the complexity of the detection scheme is reduced when compared to prior art underflow and overflow detection systems.
Referring to FIGS. 1 and 2, a conventional hardware implementation of an underflow and overflow detection scheme is illustrated. FIG. 1 includes range checking logic for each possible out of range result, e.g. underflow single, underflow double, overflow single, overflow double. Additionally, an input is provided for a special degate in the case of any special numbers, such as infinity, not a number (NAN), zeros, or the like.
Those skilled in the art will understand that the standard format for a floating point number is in the form of 1.[nnnnnn . . . m] *2.sup.exp, where n equals a either a binary 1 or 0 and m is equal to the number of bits utilized by the system. The exponent (exp) is equal to a binary number. In the preferred embodiment of the present invention, m is equal to 23 bits for single precision and 52 bits for double precision. The exp is equal to 8 bits for single precision and 11 bits for double precision. It can be seen that 8 bits will give 256 values (2.sup.8). As noted above the single precision exponent value range is from -126 to +127 which corresponds to 254 values, the other two values (all zeros and all ones) are used for special cases. For example, if the exponent of the floating point number is all zeros and the fractional portion is also all zeroes then the number is a logical zero. When the exponent is all zeros and the fraction is not all zeros, then the number is a denormalized number in accordance with the IEEE Standard for Floating Point Arithmetic. When the exponent is all ones and the fraction is all zeros, the number is infinity. And, when the exponent is all ones and the fraction is not all zeros the result is not necessarily a number (NAN). For a double precision exponent equal to 11 bits (2.sup.11) there are 2046 values, which corresponds to the double precision range of -1022 to +1023.
Assume that a floating point double precision multiply operation is to occur. For example, (1.0 * 2.sup.1023) multiplied by (1.0 * 2.sup.1023) will equal 1.0 * 2.sup.2046, since the exponents are added. Thus, for a double precision floating point operation, this multiplication will render an out of range value, i.e. 2046 is not within -1022 to +1023. In this case the processing system will need to issue a "hold writeback" signal to keep additional instructions from entering the pipeline. Furthermore other processing operations, which are outside the scope of the present invention, must be implemented in order to correct the out of range value. These operations are specified in the IEEE/ANSI Standard 754-1985, "IEEE Standard for Binary Floating Point Arithmetic".
Those skilled in the art will also understand that normalization of a floating point number also must be accounted for. That is, assume the following operation: (1.00101 * 2.sup.3)-(1.00100 * 2.sup.3). The resulting value will be 0.00001 * 2.sup.3. In order to normalize the number the binary point must be moved 5 places to the right in order to get the floating point number in the proper 1.nnnn . . . m *2.sup.exp form. This will change the exponent by adding a negative 5 (-5) to 3, giving a negative 2 (-2). In this example the exponent value, +3 is the intermediate exponent (exp.sub.-- int) and the exponent value, -5 is the exponent adjust number (exp.sub.-- adjust).
Referring more specifically to FIG. 1, a conventional underflow and overflow range checking mechanism is shown by reference numeral 1. Register 3 is used to store the intermediate exponent value and register 5 stores the exponent adjust value. These values are then added by a carry lookahead adder 7. The resulting exponent is the sum of the exponent adjust and intermediate exponent, output from adder 7. A two's complement addition scheme is used by adder 7, as noted by the +1. This method is well known in the art and allows an adder to perform binary subtraction. For example assume it is desired to subtract the binary number 0101 (decimal 5) from binary 0011 (decimal 3). Using the two's complement method, the binary 1 values are interchanged with the binary 0 values, for the number being subtracted (in this case 1010), and a +1 is then added to the result, yielding 1011. This new, two's complement number (1011), is then added to the number being subtracted from, i.e. 0011. In this case, adding binary number 0011 (decimal 3) and the two's complement of 0101 (decimal 5), which is 1011, gives the result of 1110, or decimal -2. Thus, it can be seen how the two's complement method allows binary subtraction using only an adder circuit. This method is utilized by the present invention to allow a range checking value to be added to the intermediate exponent and the exponent adjust in order to obtain a checking point that transitions about zero. The present invention will be described in more detail below with regard to FIG. 3.
The resulting value (exp.sub.-- result) from adder 7 is then checked for an out of range condition. For example, in a single precision floating point operation, the exponent result would be input to comparators 9 and 13. These comparators check to see if the exp.sub.-- result is within the range, e.g. -127 to +126. Similarly in a double precision floating point operation, the exp.sub.-- result value is input to comparators 11 and 15 to see if the value is within the range -1023 to +1022.
FIG. 2 shows an example of the logical steps required by each comparator 9, 11, 13, 15 to determine if the exponent result is out of range. In particular, the overflow double comparator 15 will implement the steps shown in FIG. 2 to determine if the exp.sub.-- result is greater than +1023. It can be seen that using this logic to determine the out of range condition will use a significant amount of time. A gate delay of 3-4 levels (at approximately 0.3 nanoseconds per gate) is required to determine the out of range conditions using the prior art circuit of FIG. 1.
Referring again to FIG. 1, the comparators 9, 11, 13, 15 are grouped together based on whether they check for underflow or overflow. Comparators 9 and 11 determine if an underflow condition exists, while comparators 13 and 15 look for overflow. The outputs of comparators 9 and 11 are input to an OR gate 17. When either of the signals output from these comparators is valid (i.e. an underflow exists), then a valid signal is output from OR gate 17 to an exponent underflow (exp.sub.-- undfl) AND gate 21. Similarly, the outputs of overflow comparators 13 and 15 are sent to OR gate 19. Again, when either of these signals is valid (an overflow exists), then a valid signal is output from OR gate 19 to exponent overflow (exp.sub.-- ovfl) AND gate 23. It can be understood that a floating point operation will either be single precision or double precision, but not both. Therefore, a signal will only be provided from one of the underflow comparators 9 and 11, and the overflow comparators 13 and 15.
In addition to the output from OR gates 17 and 19, a special case degate signal (e.g. infinity, NAN, zero) is inverted and input to AND gates 21 and 23. Under normal circumstances, this will cause a valid signal to input to AND gates 21 and 23. In this manner, when a valid signal is output from either OR gate 17 or 19, two valid signals will be ANDed together at gates 21 and 23 with the special case degate signal giving a valid signal output to a three-port OR gate 25. It can be seen how a single exponent result can only be considered an underflow or overflow, but not both. That is the exp.sub.-- result will either be out of range in the negative direction (less than -127 for single precision or less than -1023 for double precision) or out of range in the positive direction (greater than +126 for single precision or greater than +1022 for double precision). Thus, the outputs of the AND gates 21 and 23 must be ORed together to ensure that either out of range condition will be detected. The other input to OR gate 25 is for any other conditions which may occur that would require the processor to hold the floating point writeback signal. Therefore, in the prior art system of FIG. 1, if either of the signals from AND gates 21, 23 are valid, then a valid signal will be output from OR gate 25 indicating that an out of range condition has occurred in the floating point exp.sub.-- result and the writeback stage will be held.
It can be seen that a substantial amount of processing overhead is required to implement the logic of FIG. 2 since at least 3-4 gate delays (at 0.3 ns per gate) are required at comparators 9, 11, 13, 15, and the out of range condition is determined subsequent to computation of the exponent result. Thus, a system that would determine whether an out of range condition has occurred without incurring the time penalty required by the comparators of the prior art systems would be desirable.