This invention relates generally to data processing systems and more specifically to the architecture and method of using an efficiently configured floating point arithmetic unit.
A variety of architectures are known and in use for performing floating point calculations in digital computers. Given the complexity of floating point calculations and the sizes of typical operands, where for scientific applications mantissas nominally are composed of greater than 50 bits, there exists an acute need for an architecture which retains the scientific precision yet reduces the circuit complexity. This is particularly true in the competitive market of workstations where computer designers strive to provide the workstation user with the greatest degree of computational capability while minimizing the cost and size of the hardware.
The classical floating point arithmetic operation to which the present invention is directed involves, in succession, the multiplication of two operands followed by the addition of the resultant with a third operand, mathematically described as A*B+C. The significance of this mathematical computation can be gleaned from the discussion in U.S. Pat. No. 4,969,118. Though highly flexible in usage, the mathematical operation of A*B+C exacts a significant toll as far as circuit complexity when implemented in a conventional architecture. The problems arises from the fact the final stage adder must be capable of handling a full 3N bits to ensure that the outcome retains the precision attributed to the individual N bit operands. Thus for our representative 50 bit operand, the full adder must be capable of handling greater than 150 bits. An example of such an architecture appears in U.S. Pat. No. 4,999,802.
A full adder with resources to provide an output a bit count in the range of 150 is very large and will therefore likely consume a significant portion of any integrated circuit chip upon which it resides. The architecture in U.S. Pat. No. 4,999,802 is made more efficient by selectively pipelining operations to improve the overall performance of the floating point arithmetic unit in situations where the results of the first operation are determined to be operands in a successive operation.
One floating point architecture refinement which reduces the size of the full adder uses an incrementer for the upper third most significant bits. Examples of such configurations appear in U.S. Pat. No. 4,969,118 and IBM Technical Disclosure Bulletin, Volume 30,No. 3, August 1987, pages 982-987. Thus, an incrementer of size N reduces the size of the full adder in a proportional amount, leaving the adder to be 2N for N size operands in the calculation of A*B+C. Incrementers are known to be significantly smaller than adders, in that they merely take a carry-in signal and propagate the result to more significant the carry-out of the incrementer becomes an end around carry back to the carry-in of the adder.
The electronic devices needed to perform the multiplication stage of the arithmetic operation A*B+C are similarly complex for operands having large values of N. Contemporary high speed floating point multiplication architectures use Wallace trees, composed of carry save adders (CSAs) configured in arrays, to provide an outpoint having 2N bits for two operands individually composed of N bits. Examples of such appear in U.S. Pat. Nos. 4,969,118 and 4,999,802, as well as the aforementioned IBM Technical Disclosure Bulletin, and in the text book entitled Computer Architecture A Quantitative Approach by D. A. Patterson et al, Copyright 1990, Pages A-42 through A-49. There is suggestion in the text book by Patterson et al that the multiplier array could be reduced in size through the practice of multiple passes.
In the context of such teaching, there remains a need for a floating point architecture, and a related method of use, in which the full adder is further reduced in size, the multiplier arrays is reduced in size, and pipelining is implemented to overlap multiplication and addition operations.