The present invention is generally directed to a system for performing pipelined multiply and accumulate operations. More particularly, the present invention is directed to a circuit for multiplying a pair of floating point numbers in sign-magnitude form and accumulating a sum of such products in two's complement form. The circuitry of the present invention is implementable on a single VLSI integrated circuit chip operating at a pipeline clock rate of 100 nanoseconds.
In mathematics, physics and computer graphics, it is often very highly desirable to be able to compute inner products of vector quantities. Such inner products are almost always formed as the arithmetic sum of the arithmetic products of corresponding vector components. In order to carry out these inner product operations, pipelined architectures are used. Such architectures employ multipliers and adders in which floating point operations are carried out in two basic stages, namely multiplication followed by addition. Multiplication is often done by means of an array multiplier. However, and most importantly for consideration of the present invention, it is noted that the subsequent addition and/or subtraction operations require a more complicated sequence of events than is necessary.
In floating point addition and/or subtraction operations, the following basic seven steps are required to actually execute the addition/subtraction operation. (1) It is first necessary to compare the two exponent fields of the floating point numbers to determine an exponent difference. (2) Next, the fractional part (mantissa) of the floating point number having the smaller exponent is "denormalized". This means that the specified mantissa is shifted so as to produce floating point numbers having the same exponent. (3) Next, the denormalized mantissa is added to or subtracted from the mantissa with the larger exponent or vice versa depending upon the signs of the operands. (4) Next, the sign of the result is determined. (5) Following sign determination, the result is converted into a signed-magnitude representation. (6) After conversion, the number of leading zeros of the result is determined. Here is it important to bear in mind that leading zeros can result from addition and/or subtraction operations in which the high order bits cancel one another. (7) Lastly, the result must be normalized and correspondingly the resulting exponent must be adjusted in concert with the normalization process carried out for the fractional part of the result.
Floating point processors available on the market have, however, failed to achieve as high a speed of operation as the present invention has made possible. In particular, floating point processing chips and devices have in the past employed individually short pipeline stages arranged in a long sequence of pipeline events. Such systems, however, suffer from an unduly long pipeline latency. Long pipeline latency prevents the users from integrating such systems effectively and efficiently. 0n the other hand, short pipeline latency systems often suffer from low speed operation.