Digital electronic devices, such as digital computers, calculators and other devices, perform arithmetic calculations on values in integer, or “fixed point,” format, in fractional, or “floating point” format, or both. Institute of Electrical and Electronic Engineers (IEEE) Standard 754, (hereinafter “IEEE Std. 754” or “the Standard”) published in 1985 and adopted by the American National Standards Institute (ANSI), defines several standard formats for expressing values in floating point format and a number of aspects regarding behavior of computation in connection therewith. In accordance with IEEE Std. 754, a representation in floating point format comprises a plurality of binary digits, or “bits,” having the structuresemsb . . . elsbfmsb . . . flsbwhere bit “s” is a sign bit indicating whether the entire value is positive or negative, bits “emsb . . . elsb” comprise an exponent field that represents the exponent “e” in unsigned binary biased format, and bits “fmsb . . . flsb” comprise a fraction field that represents the fractional portion “f” in unsigned binary format (“msb” represents “most significant bit” and “lsb” represents “least significant bit”). The Standard defines two general formats. A “single” format comprises thirty-two bits while a “double” format comprises sixty-four bits. In the single format, there is one sign bit “s,” eight bits “e7 . . . e0” comprising the exponent field and twenty-three bits “f22 . . . f0” comprising the fraction field. In the double format, there is one sign bit “s,” eleven bits “e10 . . . e0” comprising the exponent field and fifty-two bits “f51 . . . f0” comprising the fraction field.
As indicated above, the exponent field of the floating point representation “emsb . . . elsb” represents the exponent “E” in biased format. The biased format provides a mechanism by which the sign of the exponent is implicitly indicated. In particular, the bits “emsb . . . elsb” represent a binary encoded value “e” such that “e=E+bias.” This allows the exponent E to extend from −126 to +127, in the eight-bit “single” format, and from −1022 to +1023 in the eleven-bit “double” format, and provides for relatively easy manipulation of the exponents in multiplication and division operations, in which the exponents are added and subtracted, respectively.
IEEE Std. 754 provides for several different formats with both the single and double formats which are generally based on the bit patterns of the bits “emsb . . . elsb” comprising the exponent field and the bits “fmsb . . . flsb” comprising the fraction field. If a number is represented such that all of the bits “emsb . . . elsb” of the exponent field are binary ones (i.e., if the bits represent a binary-encoded value of “255” in the single format or “2047” in the double format) and all of the bits “fmsb . . . flsb” of the fraction field are binary zeros, then the value of the number is positive or negative infinity, depending on the value of the sign bit “s.” In particular, the value “v” is v=(−1)s∞, where “∞” represents the value “infinity.” On the other hand, if all of the bits “emsb . . . elsb” of the exponent field are binary ones and if the bits “fmsb . . . flsb” of the fraction field are not all zeros, then the value that is represented is deemed “not a number,” which is abbreviated in the Standard by “NaN.”
If a number has an exponent field in which the bits “emsb . . . elsb” are neither all binary ones nor all binary zeros (i.e., if the bits represent a binary-encoded value between 1 and 254 in the single format or between 1 and 2046 in the double format), the number is said to be a “normalized” format. For a number in the normalized format, the value represented by the number is v=(−1)S2e-bias(1.|fmsb . . . f . . . lsb) where “|” represents a concatenation operation. Effectively, in the normalized format, there is an implicit most significant digit having the value “one,” so that the twenty-three digits in the fraction field of the single format, or the fifty-two digits in the fraction field of the double format, will effectively represent a value having twenty-four digits or fifty-three digits of precision, respectively, where the value is less than two, but not less than one.
On the other hand, if a number has an exponent field in which the bits “emsb . . . elsb” are all binary zeros, representing the binary-encoded value of “zero,” and a fraction field in which the bits fmsb . . . flsb are not all zero, the number is said to be a “de-normalized” format. For a number in the de-normalized format, the value represented by the number is v=(−1)s2e-bias+1(0.|fmsb . . . flsb). It will be appreciated that the range of values of numbers that can be expressed in the de-normalized format is disjointed from the range of values of numbers that can be expressed in the normalized format, for both the single and double formats. Finally, if a number has an exponent field in which the bits “emsb . . . elsb” are all binary zeros, representing the binary-encoded value of “zero,” and a fraction field in which the bits fmsb . . . flsb are all zero, the number has the value “zero”. It will be appreciated that the value zero may be positive zero or negative zero, depending on the value of the sign bit.
The discipline of interval arithmetic represents a range of values as a pair of numbers. For example, the interval [a,b] may represent the set of numbers x such that x is not less than a and b is not less than x:[a,b]={x|axb}
Any numerical function f of one numerical argument is then extended to accept an interval as an argument by considering the setF={f(x)|axb}and then definingf([a,b])=[p,q] where p=inf F and q=sup Fwhere “inf F” (also called the greatest lower bound of F) is the largest number that is not greater than any number in the set F, and “sup F” (also called the least upper bound of F) is the smallest number that is not less than any number in the set F. Thus, the result is the smallest possible interval that contains every possible result of applying f to some number in the argument interval.
Similarly, any numerical function g of two numerical arguments is extended to accept intervals as arguments by considering the setG={g(x,y)|ax=b and cyd}and then definingg([a,b], [c,d])=[p,q] where p=inf G and q=sup G.
The result is the smallest possible interval that contains every possible result of applying g to two numbers such that the first number lies in the first argument interval and the second number lies in the second argument interval. It may be difficult in some cases to ascertain this result set precisely because of mathematical difficulty or limits on computational resources. Therefore, it may be acceptable to compute an approximation, [p′,q′], to the true interval result such that p′<=p and q<=q′, so that the approximate result interval completely contains the true result interval.
For certain very well behaved functions f and g, it is relatively easy to specify the true interval result in terms of the endpoints of the argument interval(s) without the need to refer to applications of f or g to all possible numerical values in the specified intervals.
For example, if f is “−,” the negation operation,−[a,b]=[−b,−a]and if g is the binary addition operation “+,”[a,b]+[c,d]=[a+c,b+d].
Similarly, if g is the binary subtraction operation “−,”[a,b]−[c,d]=[a−c,b−d].
If g is the binary multiplication operation “*,”[a,b]*[c,d]=[min(a*c,a*d,b*c,b*d), max(a*c,a*d,b*c,b*d)]where “min” is a function that returns a result equal to the smallest of its arguments and “max” is a function that returns a result equal to the largest of its arguments.
And if g is the binary division operation “/,”[a,b]/[c,d]=[min(a/c,a/d,b/c,b/d), max(a/c,a/d,b/c,b/d)],provided that either c>0 or d<0 (so that the divisor interval does not contain the value 0).
The theory of interval arithmetic is sometimes used as the basis of a computational discipline within digital computers. In particular, sometimes the endpoints “a” and “b” of an interval are represented as floating-point numbers, and (for example) sometimes these floating-point numbers are represented according to IEEE Std. 754 for Binary Floating-Point Arithmetic.