The present invention is generally directed to numerical operations that are performed in computers, and more particularly to a software-based implementation of a handler for operations that are performed upon floating-point denormalized numbers.
One of the principle attributes of computers is their ability to perform mathematical functions and other types of numerical operations in a rapid manner. The computer""s ability to quickly handle large numbers of mathematical calculations makes it possible to filter, or otherwise process, a wide variety of different types of data. Some applications require the computers to process numbers having very large magnitudes, whereas other applications require the ability to work with infinitesimally small data values that are several orders of magnitude less than one. Hence, depending upon the particular application in which they are used, computers may be required to deal with numbers that span the entire spectrum from zero to infinity, for both positive and negative values.
One issue that arises in connection with this need to handle numbers over such a large range is the manner in which the numbers are represented in the memory of the computer. Typically, when a numerical operation is being performed, the data values that pertain to the operation are stored in registers. Each register has a fixed size, e.g., 32 bits or 64 bits. Consequently, all data values that might be encountered by the computer must be capable of being reliably stored in a register of the designated size, regardless of their magnitudes. To address this issue, a standard has been promulgated for a binary numerical format. This standard, known as the ANSI/IEEE Standard 754-1985 for Binary Floating-Point Arithmetic, describes a 32-bit format and a 64-bit format for the representation of numerical values.
According to the IEEE standard, numerical values are represented in a signed exponential format. Referring to the 32-bit format, for example, the first bit represents the sign of the number. The next eight bits indicate the exponential value for the number, and the final twenty-three bits comprise the value of the significand, or mantissa, of the number. The number of bits that are allocated to the exponent defines the range of numbers that can be represented. For instance, eight bits enable numbers in the range of 2xe2x88x92126 to 2127 to be represented. The number of bits that are allocated to the significand determine the precision with which the number can be represented. To increase this precision by one additional bit, the IEEE standard specifies that the number should be normalized, so that the most significant bit of the significand always has a value of one. In this case, since the value of the most significant bit is always known, it can remain implicit, and need not be stored in the register. Hence, by normalizing the numbers, they can be represented with an accuracy of 24 bits in the significand.
However, when normalized numbers are employed, values that lie in a range close to zero cannot be represented. This is due to the fact that there is a lower bound on the value of the exponent, which is imposed by the number of bits that are allocated to the representation of the exponent. For instance, if 8 bits are assigned to the exponent, the smallest value for the exponent is typically xe2x88x92126. As a result, where the most significant bit of the significand has a value of one, the range of numbers that cannot be represented with normalized values is about xc2x110xe2x88x9238 to 10xe2x88x9245 in decimal notation.
Certain types of applications require numerical operations to be performed which could result in values that fall within this range. For example, ray tracing and digital signal processing for audio signals can utilize extremely small-valued numbers. The results of certain operations on these numbers, such as subtraction or division, may be so small that they cannot be reliably represented in a normalized format. This presents a situation known as underflow. In such a case, the results may be stored as the next lowest value that can be reliably represented, usually zero. Typically, many processors employ this type of operation, which is known as xe2x80x9cflush to zeroxe2x80x9d, as a default mode of operation when underflow occurs. However, the inaccuracy inherent in this type of operation can bring about extremely adverse consequences. For instance, if a number is to be used as a divisor or an exponent value in a subsequent calculation, using zero instead of the exact value for the number leads to undesirable results.
To accommodate this need to represent small values near zero, the IEEE standard supports operations that are carried out with xe2x80x9cdenormalizedxe2x80x9d numbers, i.e., numbers whose significand do not have an implicit one in the most significant bit position. According to the standard, these denormalized numbers are represented with an exponent value that is all zeroes, and a significand which is non-zero. By permitting numbers to be represented in this manner, the standard enables numerical operations to be carried out with values that fall in a range that is much closer to zero, albeit with less precision than normalized numbers. With this capability, small-valued numbers can be used in operations which yield even smaller results, such as subtraction and division, without being automatically flushed to zero. This ability to represent successively smaller numbers is known as xe2x80x9cgradual underflowxe2x80x9d.
In the past, numerical operations which employed denormalized numbers were typically performed by a hardware device incorporated into the arithmetic logic unit of a processing system. Some processing systems, however, do not include a denormal processor. For example, some types of vector processing engines, which have the ability to operate upon multiple arrays of data values with a single instruction, do not contain denormal processors. In these processing engines, therefore, the default mode of operation is to flush to zero when extremely small-valued results are returned from an operation. As noted above, however, this type of result can lead to undesirable consequences. Accordingly, it is desirable to provide a software implementation of gradual underflow for processing systems which do not include hardware that is designed to handle operations on denormalized numbers.
In accordance with the present invention, operations that involve denormalized numbers are handled by restructuring the input values for an operation as normalized numbers, and performing calculations on the normalized numbers. As a first step in the process of performing an operation, a determination is made whether input values for the operation contain one or more denormalized numbers. For certain types of operations, a determination is made whether the input values are such that the output value from the operation will be a denormalized number. For each operation in which either the input values or output values comprise a denormalized number, the input values are scaled to produce values that are not denormalized. In one embodiment of the invention, this scaling is carried out by counting the number of leading zeroes in the significand of an input value. Once the appropriate factoring has been carried out, the requested operation is performed, using normalized numbers.
Hence, the same instructions can be used for both normalized and denormalized numbers, avoiding the need to create a specialized set of instructions for denormalized numbers.
As a further feature of the invention, denormalized numbers are handled in a vector processing engine together with other types of values. If a vector contains a mixture of denormalized numbers, normalized numbers and perhaps other special values, the denormalized numbers are first identified, and scaled. Once the scaling is complete, all of the values in the vector are processed, using conventional operations for normalized numbers. Thereafter, the results obtained from the scaled denormalized numbers are adjusted in accordance with the original scaling, to produce the final result.