The IEEE (Institute of Electrical and Electronics Engineers) standard for floating point arithmetic (IEEE 754) specifies how floating point numbers of single precision (32 bit), double precision (64 bit)), single-extended precision (≧43-bit, not commonly used) and double-extended precision ((≧79-bit, usually implemented with 80 bits) are to be represented (including negative zero, denormals, infinities and NaNs, which stands for “not a number”), as well as how arithmetic should be carried out on them. Only 32-bit values are required by the standard; the others are optional. It also specifies four rounding modes and five exceptions (including when the exceptions occur, and what happens when they do occur).
The exponents are biased by (2e−1)−1, where e is the number of bits used for the exponent field. For example, a single precision number has an 8-bit exponent and so its exponent is stored with 27−1=127 added to it, also called “biased with 127.” Normal single precision exponents range between −126 and 127. An exponent of 128 is reserved for plus or minus infinity. An exponent of −127 (all zeroes) is reserved for plus or minus zero (or for denormals, but in the case of denormals the bias used is (2e−1)−2, i.e. 126 not 127, since the most significant bit of the mantissa is presumed to be zero, not one). Some examples of single precision floating point representations are illustrated in Table 1.
TABLE 1Example single precision floating point representations.typesignexpexp + biasexponentfractionzeroes0 or 1−12700000 0000000 0000 0000 0000 0000 0000one001270111 1111000 0000 0000 0000 0000 0000minus101270111 1111000 0000 0000 0000 0000 0000onemin−126−12600000 0000000 0000 0000 0000 0000 0000denormalMax−126−12600000 0000111 1111 1111 1111 1111 1111denormalmin−126−12610000 0001000 0000 0000 0000 0000 0000normalmax1271272541111 1110111 1111 1111 1111 1111 1111normalinfinities1281282551111 1111000 0000 0000 0000 0000 0000NaN1291282551111 1111Non zero
The normal floating point number has value, v=s×2e×m, where s, e and m are defined as:
s=+1 (positive numbers and +0) when the sign bit is 0
s=−1 (negative numbers and −0) when the sign bit is 1
e=exponent-bias (i.e. the exponent is stored with a bias added to it)
m=1. fraction in binary (that is, the mantissa or significand is the implicit leading bit value 1 followed by the radix point followed by the binary bits of the fraction).
Thus, 1≦m<2.
A denormal number (also called a denormalized number, or a subnormal number) is a number smaller (in absolute value) than the smallest normal number but still non-zero. The production of a denormal is sometimes called gradual underflow because it allows a calculation to lose precision slowly when the result is small. Denormal numbers were implemented in the Intel 8087 floating point coprocessor while the IEEE 754 standard was being written. This implementation demonstrated that denormals could be supported in a practical implementation.
In a normal floating point representation, it is assumed that the leading binary digit in the mantissa is equal to 1. Since it is known to be equal to 1, the leading binary digit of the mantissa may, in some floating point representations, be omitted and the exponent value adjusted accordingly. Denormal values are those values which cannot be represented in normalized form (i.e., having the smallest possible exponent with a mantissa that is non-zero). Some implementations of floating point units (FPUs) do not directly support denormal numbers in hardware, but rather trap to some kind of software or microcode support. While this may be transparent to the user, it can result in calculations which produce or consume denormal numbers being much slower than similar calculations on normal numbers.
Modern processors may also have instructions that perform single-instruction multiple-data (SIMD) operations on floating point numbers. When these SIMD operations produce or consume denormals, an exception may be triggered to handle the operation in software or in hardware under the assistance of microcode. One way to support denormals in hardware is through a wider internal representation, which has enough precision to simply treat denormals as small normals. For example, if an exception involving single precision denormals is triggered, microcode may convert the single precision operands to normal double precision operands and re-execute the operations on the wider representations followed by denormalizing the results and converting them back to single precision. Likewise, double precision denormals may be handled as double-extended precision normals. One drawback is that calculations which produce or consume denormal numbers still become significantly slower than similar calculations on normal numbers.
To date, more efficient techniques for handling floating point exceptions in a processor that executes SIMD instructions have not been fully explored.