Engineering and scientific calculations require a high degree of precision and a wide dynamic range. The high precision minimizes the overall error associated with a computation. The wide dynamic range, on the other hand, ensures that calculations involving very large or very small numbers produce a valid result.
While the twin demands of precision and dynamic range can be represented with very long integers, this approach reduces computational speed and increases costs. Therefore, in order to reduce costs and increase speed, many conventional computers use floating-point notation. Floating-point notation is a numeric format which can represent large and small numbers with fewer bits than are required by very long integers. The computational circuitry that performs floating-point operations is often called a floating-point processor.
The basic principles of floating-point arithmetic are familiar to anyone who has used a scientific calculator. Basically, floating-point numbers are stored in two parts, a mantissa and an exponent. The mantissa specifies the digits in the number, and the exponent specifies the magnitude of the number (the position of the decimal point). For example, the numbers 234,500,000 and 0.0000678 are expressed respectively as 2.345.times.10.sup.8 and 6.78.times.10.sup.-5 in floating-point notation.
Most standard methods of representing floating-point numbers specify that numbers should be represented in normalized form whenever possible. A normalized floating point number is one in which the left-most digit (or bit, in a binary representation) of the mantissa is non-zero. A typical floating-point processor normalizes a number by left-shifting the most significant non-zero bit into the first bit of the mantissa. For each left-shift, the floating point processor increments the exponent by one.
Before the floating-point processor can add two operands, the floating-point processor "aligns" the two operands. The alignment process sets the exponents of the operands numbers to the same value. During the alignment process, the floating-point processor compares the exponents of the two operands, and increases the value of the smaller exponent to equal the value of the larger exponent. When the floating-point processor increments the smaller exponent, the floating-point processor also right-shifts the mantissa of the smaller operand.
One problem that may arise during this alignment process is that significant data can be right-shifted out of the smaller operand, causing a loss of precision as is illustrated in the following case: ##EQU1##
Before adding the two numbers, the floating-point processor right-shifts the mantissa of the second number three places: ##EQU2##
Clearly in this case the three least significant digits of the smaller operand "789" are lost, and the overall precision of the operand is degraded.
Thus, floating-point addition requires the alignment of one of the two input operands before the actual add can take place. As shown in the example above, significant (non-zero) digits may be "shifted out" and lost during the alignment process because of the finite width of the computational elements. The process of rounding can reduce this problem to insignificant proportions by capturing the bits which are "lost" or shifted out during the arithmetic operations.
There are a number of different rounding schemes in existence. Each specifies a different set of rules for deciding how to derive the final (rounded) result. Although various floating-point standards are available, a floating-point standard which is widely accepted is the IEEE (Institute of Electrical and Electronic Engineers, Inc.) Binary Floating Point Standard 754, which is herein incorporated by reference.
The IEEE 754 standard specifies different rounding modes in which each mode defines a different set of rules for deciding how to derive the final (rounded) result. These rounding modes include: round-to-nearest, round-to-minus-infinity, round-to-plus-infinity, and round-to-zero. In addition, many major manufacturers have defined particular rounding schemes, such as Digital Equipment Corporation's DEC-round-to-nearest and the IBM round-to-zero mode.
In order to implement the various rounding modes correctly, conventional floating-point processors retain some of the bits shifted out of the mantissa. It is commonly known that in order to make a correct decision in all cases, it is necessary and sufficient to retain three additional bits to the right of the mantissa. These bits are known as the "guard," "round" and "sticky" bits.
The guard bit and round bit act as straightforward extensions of the operand mantissa, to the right of the mantissa's least-significant bit. The floating-point processor right-shifts data from the mantissa into the guard bit and the round bit. Subsequently, the floating-point processor may left-shift the data from the guard and round bits back into the mantissa during the normalization process.
Conventional floating-point processors selectively set the sticky bit once the floating-point processor shifts data past the round bit. Floating-point processors do not left-shift data out of the sticky bit, even if the sticky bit is set. Rather, the sticky bit acts as a memory, indicating, when set, that the floating-point processor has right-shifted significant data out of the mantissa beyond the round bit. In other words, the sticky bit is defined as the OR of all of the bits which are less significant than the round bit.
By considering the values of the guard, round and sticky bits at the time rounding is to be performed, conventional floating-point processors determine which of the closest representable numbers should be output as a result of the operation. One disadvantage of the prior art systems is that the sticky bit is calculated during the addition process. As a result, the time required to calculate the sticky bit can delay completion of the addition process. If, however, the time to calculate the sticky bit can be reduced, the time to complete an addition can also be reduced, thereby increasing the overall speed and performance of the floating-point processor.
Recently, the clock speeds of conventional floating-point processors have become increasingly faster so that any delays in the addition process can result in delays in the overall system performance. In order to optimize the time required to multiply two numbers, some floating-point processors separately scan a number during the multiplication process in order to predict the need to generate a sticky bit. Such systems, however, fail to scan a mantissa prior to an addition operation. Thus, instead of relying on a prescanned sticky significance value, prior art systems scan a mantissa during the addition process and therefore require the additional time associated with such scanning.
In addition, advances in parallel processing have resulted in improved floating-point performance. Parallel processing, however, requires the distribution of the workload among parallel processes. If the workload is unequally distributed, the slowest process delays the entire system.
Conventional floating-point processors do not determine a sticky significance value in parallel with other operations. For example, conventional floating-point processors normalize and align numbers prior to performing an addition. If the floating-point processor can determine the sticky significance value in parallel with the normalization or alignment processes the resultant floating-point additions can execute faster and make greater use of parallel architectures.
Therefore, it would be advantageous to develop a scheme in which the sticky bit can be calculated by scanning a mantissa prior to the addition process. Such a scheme would reduce the time needed to add two numbers. Furthermore, it would be advantageous to calculate the sticky bit in parallel with other floating-point operations so as to increase system throughput and optimize work distribution.