1. Field of the Invention
The present invention generally relates to main frame digital computers and, more particularly, to an improved high speed multiplier which is optimized for fixed-point calculations yet produces floating-point results faster than some designs which were optimized for floating-point calculations.
2. Description of the Prior Art
As is well known in the art, the product of a multiplication can be formed by shifting and adding the multiplicand as a function of the individual bits of the multiplier. This requires an adder with as many ports as there are bits in the multiplier. Using an iterative algorithm with a relatively small multiplier digit saves on adder hardware; however, additional ports are needed for the running sum and carry feedback. Thus, a ten bit multiplier digit would need a twelve-port adder.
To reduce the number of addends, various schemes have been developed to recode the multiplicand into a set of true and/or complemented multiples based on a decoding of adjacent bits in the multiplier. These schemes cut the number of addends in half. In the example above, the 10-bit digit multiplier would need a seven port adder, five ports for multiples of the multiplicand and one port each for the running sum and carry. The outputs of the adder tree are sum and carry bits which must be added together by a Carry-Propagate Adder (CPA) to produce a final product. In the iterative multiply device, one digit of the result is generated during each iteration. As additional digits are generated, they are concatenated together to produce the whole product.
U.S. Pat. No. 4,769,780 to D. C. Chang discloses a high speed multiplier wherein the multiplier and multiplicand are stored in A and B registers, respectively, and the result of the multiplication is stored in registers A and C, with the low order portion of the result in register A and the high order portion of the result in register C. Eleven multiplier bits in register A are selectively gated to recoding circuitry which recodes the multiplier into five control groups. These control groups control shift gates connected to register B to gate selected groups of multiples of the multiplicand to a first Carry-Save Adder (CSA), the sum and carry outputs of which are applied to a second CSA that accumulates partial products from iteration to iteration. A Spill Adder (SPA), connected to the second CSA, generates a low order portion of a final result of the multiplication. This low order portion is temporarily stored in an SPA register and transferred to the A register. These product digits are stored in locations that have been vacated by shifting and retiring the multiplier digits. The high order portion of the multiplication result is generated by a full adder connected to receive the sum and carry outputs of the second CSA, the high order output being stored in the C register.
This particular multiplier operates at a rate of double the system clock frequency on a 10-bit wide data path on each double frequency cycle. The double frequency clock of the Chang high speed multiplier is a highly complex feature. Many staging registers are needed in such a design since the half-cycle paths are effectively limited in the allowable number of logic levels thereby increasing the number of cycles necessary to perform any operation.
Other multiply devices including those described in U.S. Pat. No. 4,584,679 to S. George et al. are not iterative in nature. They retire the entire multiplier value at once by generating (usually) four partial products in parallel where each partial product is the result of the multiplication of the multiplicand with a subset of the multiplier. These partial products must then be assimilated into the final product. While adders of the type are fast, they usually involve some amount of data staging and require large amounts of hardware.
A common feature of many multiply device designs in mainframe computers is the optimization of the design around scientific or floating point data. As a result, operations involving fixed point data such as that more common in commercial applications can suffer. As an example, the Chang device requires an extra cycle to align fixed point data before multiplication and another cycle to align the fullword result for putaway.