The present invention is directed to floating-point processors and in particular to the addition/subtraction pipelines that they employ.
Computers often use floating-point representations to represent some numbers. And many microprocessors include separate circuitry for performing operations on floating-point numbers. Such circuits take many forms, but the one that FIG. 1 depicts is typical. The floating-point processor 10 depicted there includes a control circuit 11 to which are directed microprocessor operation codes that request floating-point operations. That circuit in turn directs the operation codes that request multiplications, divisions, and additions/subtractions to respective modules sections 12, 13, and 14 that respectively specialize in those operations. Those modules in turn draw their operands from registers 15, in which the modules also store their results.
The invention to be described below concerns the addition/subtraction module 14. As is suggested by the drawing's linkage between the division and addition/subtraction modules, the addition/subtraction module may actually be involved in other operations, too, such as assisting in division, converting from fixed-point to floating-point format, or converting between floating-point formats. But for the sake of simplicity the discussion that follows concentrates only on addition and subtraction of single-format floating-point operands and omits any discussion of provisions peculiar to other operations.
Floating-point representations typically have the format &lt;sign, exponent, mantissa &gt;. One of the operations that an addition/subtraction module must perform is mantissa alignment. Suppose, for example, that operand A is 0.111100.times.2.sup.12 and operand B is 0.101000.times.2.sup.9. To add those operands' mantissas properly, operand B must be re-expressed as 0.000101.times.2.sup.12 ; i.e., its mantissa must be shifted by a number of bit positions equal to the original exponent difference. In that example, the smaller operand's mantissa is shifted by only three bit positions, but an alignment step can involve as much as a fifty-four-position shift in a typical sixty-four-bit machine. Shifts of that size tend to be relatively time-consuming.
For this and other reasons, the addition/subtraction module's process flow may be "pipelined," as FIG. 2 illustrates: the actual addition of one set of operands' (aligned) mantissas may be performed in a third stage 16 concurrently with the alignment in a second stage 17 of the next set to be added--and both may be performed concurrently with an operation in a first stage 18 that determines the amount of shifting required in the set after that. So the addition/subtraction module is often called an addition pipeline, or "add pipe." FIG. 2 depicts the add pipe as comprising three stages, but some add pipes have more or fewer stages.
Now, a floating-point processor's output should be normalized, i.e., so expressed that its mantissa's value is always, say, at least one-half but less than one: the first one bit must occur immediately to the right of the binary point. (This is the VAX floating-point format. In IEEE floating-point formats, the mantissa's value should always be at least one but less than two: the first one bit must occur immediately to the left of the binary point. But the discussion that follows will be based on the assumption that the add pipe employs the VAX format.) So in addition to the just-described alignment shift, floating-point add pipes also need to perform another, normalization shift. In the previous example, for instance, the raw result of the third stage's mantissa addition in is 1.000001, which does not meet the normalization criterion, so the result must be re-expressed as 0.100000 (or 0.100001 after rounding): the mantissa needs to be shifted.
Differences between the amounts of shifting respectively required for normalization and alignment make it necessary to treat effective subtractions (i.e., subtractions of operands whose signs are the same and additions of operands whose signs differ) of similar-magnitude operands differently from other effective subtractions and from effective additions (i.e., additions of operands whose signs are the same and subtractions of operands whose signs differ).
The amount of this normalization shifting is at most one bit position both for effective additions and, if the operands' exponent difference is at least two, for effective subtractions. A one-position shift takes much less time than the potentially fifty-four-position shift that alignment can require, so the single-position shift can be performed in the same (in the example, third) stage as the mantissas' addition or subtraction--unless the operation is an effective subtraction of operands whose exponent difference is zero or one.
For such low-exponent-difference effective subtractions, though, the normalization shifts can take as many bit positions (but in the opposite direction) as alignment shifts can in other operations. If 0.111100.times.2.sup.12 is subtracted from 0.111101.times.2.sup.12, for example, the result is 0.000001.times.2.sup.12, which must be normalized to 0.100000.times.2.sup.7 ; i.e. the mantissa must be shifted by five bit positions. So it is not attractive in such operations to do all of the normalization in the same stage as the subtraction. On the other hand, alignment for such operations never requires more than a one-bit-position shift, so a whole stage need not be set aside for it, as it must for other operations. So in low-exponent-difference effective subtractions some add pipes use for normalization the multiple-position-shift circuitry otherwise employed for alignment.
One conventional way to do this involves the arrangement of FIG. 2's first stage 18 that FIG. 3 depicts. The first stage 18 includes circuitry 19 for performing a speculative subtraction of the operands' mantissas in parallel with the exponent subtraction 20, mentioned above, that determines what the alignment-shift amount will be if the operation turns out not to be a low-exponent-difference effective subtraction. The circuit-19 subtraction is "speculative" because it is performed at a point in the operation at which the proper alignment is not yet known; circuitry 20 has not yet completed its determination. The subtraction is based on the assumption that the exponent difference is zero or one and that the requested operation is an effective subtraction.
If this assumption turns out to be erroneous, then the speculative-subtraction output is discarded, and the second stage merely performs an alignment shift on the smaller operand's mantissa to prepare it for mantissa subtraction in the third stage. But if the assumption proves correct, then the mantissa subtraction will already be complete when the operation proceeds to the second stage. Rather than alignment, therefore, that stage can perform potentially multiple-position normalization shifting.
Although the output of the speculative-subtraction circuitry 19 is used only in cases in which the exponent difference is zero or one, this still leaves three possible alignments among which the speculative-subtraction must choose: (1) the mantissas should be subtracted without shifting, (2) they should be subtracted after operand A is shifted to the right by one position, or (3) they should be subtracted after operand B is shifted to the right by one position. But this choice can be made well before circuit 20's exponent subtraction is complete; by comparing only each exponent's two least-significant bits with those of the other, an alignment predictor 22 can determine what the proper alignment will be if the exponent difference does turn out to be zero or one. The speculative subtraction can therefore proceed in parallel with circuit 20's exponent subtraction.
An output of zero or one from the exponent-difference circuit 20 during an effective-subtraction operation means that the multiple-position shifter should act as a normalization shifter rather than an alignment shifter; instead of being determined by circuit 20's exponent-difference output, the shift amount should be determined by the number of leading zeroes in the result of subtracting the smaller aligned mantissa from the larger. (It may be helpful to emphasize at this point that sign and mantissa fields in a typical machine's floating-point format do not together form a two's-complement representation of a number. That is, the four-bit fixed-point binary representations of +1 and -1 are 0001 and 1111, respectively, where the first bit can be thought of as a sign bit, but in a floating-point representation the three mantissa bits are 100 for both numbers: they differ only in their sign bits. This means that the mantissa in the add pipe's output should be the result of subtracting the smaller mantissa from the larger one, regardless of the overall subtraction's direction. In some of the add pipe's intermediate results, though, two's-complement representations do occur, as will shortly be seen.)
One could determine the requisite shift amount by counting the number of leading zeroes in the speculative-subtraction output. (Actually, it would sometimes be necessary to count the number of leading ones instead. When the exponent difference is zero, it will not initially be known which mantissa is larger--neither mantissa will have been shifted by one place to put a zero in it most-significant bit position--so the speculative-subtraction result can be the two's-complement representation of a negative number. This means that it is the number of leading ones that is the indicator of the requisite normalization shift.) However, waiting for the speculative subtraction to be completed before beginning to determine how many leading zeroes or ones the result has would add too much delay for a high-performance processor. To avoid this delay, the add pipe includes a shift-point detector 23, which in parallel with the speculative-subtraction operation performs a known type of operand-mantissa inspection that predicts the required normalization shift so that it is available as soon as the speculative subtraction is complete.
Detector 23's operation involves bit-by-bit comparisons that depend on proper mantissa alignment. Since the proper alignment is not initially known, detector 23 performs those comparisons simultaneously for the three alignments that correspond to exponent differences of zero and one. The alignment predictor 22's output determines which of shift-point detector 23's three results a multiplexor 24 forwards as its indication of the required normalization shift.
As will be explained in more detail below, that indication takes the form of a vector in which each bit indicates whether the corresponding mantissa-difference bit position has the potential to be the shift point, i.e., to be the location of the mantissa difference's most-significant logical one (or, as will be explained below, the bit position next to it). The criteria that detector 23 uses can also be met by locations less significant than the true shift point: its output vector can include more than a single one bit. So a bit stripper 25 removes all of that vector's one bits except the most significant one.
In low-exponent-difference effective subtraction, the second stage's multiple-position shifter uses the resultant vector as a decoded indication of the number of bit positions by which it should shift the speculative difference's mantissa, and the third stage's addition circuit does not need to perform the mantissa subtraction, since the speculative-subtraction circuit 19 has already done it. But the add pipe's output mantissa must be a positive value, whereas the speculative subtraction's result may be negative; if the input operands' exponents are equal, the larger mantissa has not been identified before the speculative subtraction begins, so it is not known which mantissa to subtract from which in order to obtain a positive number. The third stage's addition circuit can therefore be used to subtract the now-normalized speculative-subtraction result from zero if necessary is to convert it to a positive value.
The just-described division of labor among stages contributes significantly to the speed that a high-performance floating-point processor can afford.