The present invention is directed to floating-point processors and in particular to the addition/subtraction pipelines that they employ.
Computers often use floating-point representations to represent some numbers. And many microprocessors include separate circuitry for performing operations on floating-point numbers. Such circuits take many forms, but the one that FIG. 1 depicts is typical. The floating-point processor 10 depicted there includes a control circuit 11 to which are directed microprocessor operation codes that request floating-point operations. That circuit in turn directs the operation codes that request multiplications, divisions, and additions/subtractions to respective modules sections 12, 13, and 14 that respectively specialize in those operations. Those modules in turn draw their operands from registers 15, in which the modules also store their results.
The invention to be described below concerns the addition/subtraction module 14. As is suggested by the drawing's linkage between the division and addition/subtraction modules, the addition/subtraction module may actually be involved in other operations, too, such as assisting in division, converting from fixed-point to floating-point format, or converting between floating-point formats. But for the sake of simplicity the discussion that follows concentrates only on addition and subtraction of single-format floating-point operands and omits any discussion of provisions peculiar to other operations.
Floating-point representations typically have the format <sign, exponent, mantissa>. One of the operations that an addition/subtraction module must perform is mantissa alignment. Suppose, for example, that operand A is 0.111100×212 and operand B is 0.101000×29. To add those operands' mantissas properly, operand B must be re-expressed as 0.000101×212; i.e., its mantissa must be shifted by a number of bit positions equal to the original exponent difference. In that example, the smaller operand's mantissa is shifted by only three bit positions, but an alignment step can involve as much as a fifty-four-position shift in a typical sixty-four-bit machine. Shifts of that size tend to be relatively time-consuming.
For this and other reasons, the addition/subtraction module's process flow may be “pipelined,” as FIG. 2 illustrates: the actual addition of one set of operands' (aligned) mantissas may be performed in a third stage 16 concurrently with the alignment in a second stage 17 of the next set to be added—and both may be performed concurrently with an operation in a first stage 18 that determines the amount of shifting required in the set after that. So the addition/subtraction module is often called an addition pipeline, or “add pipe.” FIG. 2 depicts the add pipe as comprising three stages, but some add pipes have more or fewer stages.
Now, a floating-point processor's output should be normalized, i.e., so expressed that its mantissa's value is always, say, at least one-half but less than one: the first one bit must occur immediately to the right of the binary point. (This is the VAX floating-point format. In IEEE floating-point formats, the mantissa's value should always be at least one but less than two: the first one bit must occur immediately to the left of the binary point. But the discussion that follows will be based on the assumption that the add pipe employs the VAX format.) So in addition to the just-described, alignment shift, floating-point add pipes also need to perform another, normalization shift. In the previous example, for instance, the raw result of the third stage's mantissa addition in is 1.000001, which does not meet the normalization criterion, so the result must be re-expressed as 0.100000 (or 0.100001 after rounding): the mantissa needs to be shifted.
Unfortunately, it is only by performing that addition that it can be determined whether such normalization shifting is necessary in a given instance, and this makes the normalization shift a critical-path element: the time required to perform it tends to add directly to the total add-pipe latency.
Also contributing to add-pipe latency is the need for rounding. For accuracy, an add pipe's actual adders typically operate with a resolution greater than is permitted in the add pipe's output mantissa. So the add pipe must thereafter round the adders' higher-resolution raw output to the permitted lower output resolution. Although there is more than one way to perform rounding, it typically involves truncating the high-resolution value to the desired low resolution after adding a quantity equal to, say, half the desired quantization interval, i.e., after adding a one bit a single bit position to the right of the position that will be least significant after truncation. But where that position is cannot be determined until the addition has occurred. So rounding circuitry is like circuitry for post-addition normalization: it contributes disproportionately to add-pipe latency, and any improvement that adds to its speed can improve add-pipe performance.