Binary digital computers have long been used to store and process numbers, such as integers and floating-point numbers. Most if not all modern computer languages have provisions for manipulating numbers of one sort or another. Computers are essential to managing large amounts of numbers in an efficient and meaningful manner. While analog computers operate on continuous numbers, such as the real number domain, digital computers operate on discrete, countable numbers, such as the integers and rational numbers, and must approximate all others. Although some special-purpose analog computers may still be used in real-time products, such as those used in the aerospace industry, most business and personal computers in use today are digital, and are based on a binary system encoded representation of numbers using “bits”. Most if not all modern computers partition their bits into groupings or packets of eight bits called “bytes”.
Numbers have many classifications. The “Natural” numbers are 1, 2, 3, 4, 5, etc. The “Whole” numbers are 0, 1, 2, 3, 4, 5, etc. The whole numbers include all the natural numbers. The “Integer” numbers consist of the whole numbers and their negative values as well; i.e., 0, 1, −1, 2, −2, 3, −3, etc. The integer numbers include all the whole numbers. The “Rational” numbers are those numbers that occur when an integer number is divided by a whole number, such as 0, 1, −2, ⅖, −¾, 22/7, etc. The rational numbers include all of the integer numbers. The “Irrational” numbers are those numbers that cannot be expressed by dividing an integer number by a whole number. Together, the rational numbers and the irrational numbers make up the set of “Real” numbers.
Floating-point numbering is a notation, convention, or system for depicting numbers. A floating-point number is really two integers separated by a decimal point, such as 123.45, 555.0, and 0.00627. The integer to the left of the decimal point is called the “integral part” of the floating-point number, and the integer to the right of the decimal point is called the “fractional part” of the floating-point number. All floating-point numbers are rational numbers, but only some rational numbers can be depicted as floating-point numbers. For example, the rational number 5/4 is equivalent to the floating-point number 1.25. The rational number 4/3 has no floating-point notation; however, the ellipses of 1.333333 . . . are often used to denote that the 3s repeat forever. However, digital computers have finite memory, and consequently can only store a finite number of floating-point digits. The finite limit constrains the representation of such numbers to an approximation, applies to many rational numbers, and certainly to all of the irrational numbers.
Digital computers are limited to the computation of discrete structures, such as the integers. By storing two integers together, the integer numerator and the whole denominator, the rational numbers can be completely modeled. However, this strategy is seldom practiced. Instead, floating-point numbers are used because more digits can be packed into the bytes of a floating-point number than can for two integers. Also, two floating-point numbers can be compared without requiring a division operation. A set of floating-point numbers can be sorted and grouped efficiently, a task which is ideal for digital computers.
Ultimately, numbers modeled in digital computers must be mapped to a field of bits, a process called “encoding”. There have been many proposed and implemented methods for encoding integers and floating-point numbers. The most popular method for encoding integers is called the 2's complement method, and is used in the Intel Pentium® processor. In this method, the left-most bit is “1” for negative integers and “0” for all others. The industry standard method of encoding floating-point numbers is the IEEE 754 single- and double-precision format.
The IEEE 754 encoding method has been designed to optimize certain design constraints. In the IEEE 754 encoding method, a fixed-number of bytes is allocated. For example, the single-precision method encodes all floating-point numbers into four bytes. The double-precision method encodes all numbers into eight bytes. The left-most bit denotes the sign of the number. When all bytes in the encoding are zero, the floating-point number represents zero. It logically follows that the sign-bit is “1” for negative numbers, and “0” for all others. Every bit combination has a proper interpretation, which efficiently encodes as many numbers as possible into a fixed number of bytes. Some bit patterns encode special values, such as positive infinity, negative infinity, and NaN (Not A Number). The encodings have been optimized for arithmetic logic units (ALU) and other mathematical processors. The encoding methods partition the bits into three groups: the sign (S) bit, the mantissa (or precision) (M) bits, and the exponent (E) bits. For example, the four-byte single-precision encoding method partitions its 32-bits as SEEEEEEE EMMMMMMM MMMMMMMM MMMMMMMM, i.e., one sign bit followed 8 exponent bits followed by 23 mantissa bits. Some bit patterns in these encoding methods can represent the same number. For instance, 00000000 00000000 00000000 00000000 represents zero. But change the left-most sign bit to “1”, and the result is negative zero, which is the same value as positive zero. In addition, when floating-point numbers get very close to zero, such as 1.234567e-30, a different encoding method, called de-normalization, is used to minimize the loss of significant digits while keeping within the constraints of having a fixed number of bytes.
Since these standard-encoding methods define a fixed number of bytes, the range and precision of floating-point numbers are limited. For instance, the single-precision method can never represent more than seven decimal digits; i.e., 23 binary mantissa bits multiplied by Log(2) to convert the 23 base-2 digits to the number of base-10 digits. The “single”-precision floating-point method allocates 8 bits for the exponent, which allows a “signed” binary range from −128 to +127. In decimal, the range becomes approximately −38 to +38, that is, 128*Log(2) is approximately 38. Thus, “single” precision floating-point numbers may not store numbers greater than 1e39. The existing strategies for encoding floating-point numbers are optimized for speed and space considerations, and pack the greatest number of possible numeric combinations in a fixed-number of bytes.
When comparing one floating-point encoded number to another, both encoded numbers must be decoded first before one can be determined to be greater than, equal to, or less than the other. All the bytes of the floating-point numbers must be read into memory and functionally transformed (i.e., decoded) before the comparison can occur.
In modern programming languages, as well as in modern processors, one will find many different methods to encode numbers. For integers, there are 8-bit “bytes”, 16-bit “shorts”, 32-bit “ints”, and 64-bit “longs”. All such numbers can be “signed” to include negative numbers, or be “unsigned” to permit only non-negative numbers. In addition, many processors reverse the bytes in the encoding, called “little-endian” or “big-endian” strategies. Floating-point numbers have a variety of categories, such as 32-bit “singles”, 64-bit “doubles”, and 80-bit “decimals”. All such number formats have a string “printable” notation as well. When comparing a number from one encoding scheme to another number from another encoding scheme, the number must typically be converted to the encoding method of the other before the comparison can occur.
The most natural method for comparing two different generic bit structures is the left-to-right bitwise comparison technique. The bit “0” is always less than the bit “1”. To compare two bit arrays A and B, the left-most bit of A, denoted as A(0), is compared with the left-most bit of B, denoted as B(0). If A(0)<B(0), then A is less than B. If A(0)>B(0), then A is greater than B. If, however, A(0)=B(0), then the next left-most bit, denoted A(1) and B(1) respectively, is checked. The strategy repeats successively comparing A(i) to B(i) until the pair-wise bit values are found to differ. If, during the loop over the index i, either A or B runs out of bits before a difference is found, then the array that ran out of bits first is “less than” the other by default. In other words, shorter arrays are “less than” longer arrays when all pair-wise bits are found to be equal.
The aforedescribed algorithm for comparing bit arrays is typically hard-coded into modern microprocessors for efficient speed optimization. Unfortunately, the algorithm cannot be used to compare the corresponding byte arrays of most encoded numbers; especially the floating-point numbers. The reason is that the left-most sign (S) bit in popular floating-point encoding methods is “1” for negative numbers and “0” for all others. Negative numbers always come before positive numbers when numerically ordering them, but a logical contradiction occurs because the bit value of “1” comes after the bit value of “0” by numerical convention. It is not sufficient to merely flip the sign bit. For example, the integer 2 is typically coded into a byte as 00000010 using the 2's complement method. The number −2 then becomes 11111110. The bitwise order of these two numbers clearly places 11111110 after 00000010, even though the number −2 is less than the number 2. The problem remains even when constrained to the positive numbers. Consider 256 as the binary value of 00000001 00000000. When compared bitwise to the number 2 as 00000010, the first byte in 256 is less than the first byte in 2.
Instead, modern microprocessors typically include an instruction that compares two 8-byte “double”-precision IEEE 754 floating-point numbers. The instruction must pre-fetch all 8-bytes of each number before the comparison can be performed. If a 4-byte floating-point comparison instruction does not exist, then a 4-byte “single”-precision IEEE 754 floating-point number must be converted to the 8-byte encoding before the comparison can occur. For common integers, the microprocessor defines other instructions specially tailored to 1-, 2-, 4-, and 8-byte signed and unsigned integer comparisons. No one-size-fits-all single algorithm exists that can compare all these number-encoding formats.