The invention relates to the field of multiplier hardware for digital computer systems. In particular it relates to multipliers that use multiple passes of single precision hardware to perform double precision multiplication.
Array multipliers, such as Wallace-tree multipliers with and without Booth recoding, require a gate count that grows proportionally to the square of the number of bits of each operand. Hence a 53-bit multiplier requires close to three times the area as does a 32-bit multiplier array.
The number of logic levels in the worst-case data path through a 53-bit multiplier array is greater than the number of logic levels in the worst-case data path of a 32-bit multiplier array. This means that, all other factors being equal, a 53-bit multiplier is slower than a 32-bit multiplier array.
Most modern processors have 32-bit integer multiply instructions. Several common specifications for computer floating point operations, including the IEEE 754 and Digital VAX specifications, require a 24-bit (including one hidden bit) mantissa for single precision floating point, and a 53-bit mantissa for double precision floating point. These machines therefore need to be able to quickly multiply pairs of 32 bit operands, pairs of 24-bit operands, and pairs of 53-bit operands in performing their integer and floating point arithmetic. These machines also must be able to quickly add 32-bit integer operands, as well as 53 bit and 24 bit denormalized mantissas.
It is known that a 32-bit hardware multiplier can be used to perform a 53-bit unsigned multiply by performing a sequence of four multiply operations, each generating a partial product, and summing the partial products. A way that this may be done is as follows:
T5 is nominally 106 bits wide. When performing a floating point operation, it is necessary that the product bits be aligned into the correct bit positions for the mantissa of the floating point result. If T1, T2, T3, and T4 are summed in an adder, a left shift of 12 bit positions of T4 relative to T2 and T3 is required for proper alignment.
Some computer programs require more single-precision floating point and integer multiply operations than double-precision floating point operations. Machines running these programs can provide fast single precision and integer multiply operations with slower but adequate double precision multiply using a sequence of four multiply operations in each double-precision floating-point multiply instruction.
The Booth recoding algorithm is commonly used in large multiplier arrays to hold down the number of partial products that must be added during a multiply. While a classic thirty two bit Wallace-tree binary array multiplier generates and adds thirty two partial products, an array multiplier using two-bit Booth recoding need generate and adds only half as many partial products, although the logic required for generating each partial product is somewhat more complex. Reducing the number of partial products not only can reduce the number of gates in the array, but produces a faster multiplier by reducing the number of gate delays in the worst-case critical path through the array.
In the basic binary two-bit Booth algorithm, a pair of bits of the multiplier are considered for each partial product. If those bits are zero, the partial product is zero. If those bits are one, the partial product is the multiplicand. If those bits are two, the partial product is a single-bit-shifted (multiplied by two) multiplicand. If those bits are three, the partial product is minus the multiplicand, with one added in the next partial productxe2x80x94giving a net partial product term of four times the multiplicand minus the multiplicand equaling three times the multiplicand.
A common version of the Booth multiply is the modified booth recoding multiply. In this version, the multiplier is recoded from a binary number, where each digit is a 1 or a 0, to a number having fewer digits each in the range {xe2x88x922, xe2x88x921, 0, +1, +2}. Each bit pair {Bn+1,Bn} of the multiplier is transformed by a three-input digit-encoder circuit that considers bit {Bnxe2x88x921} of next lower significance to the bit pair being encoded according to the formula:
The partial products of the multiplier (booth_digit*multiplicand) are generated by taking a zero, a left shift of the multiplicand (+2), the multiplicand (+1), the negation of the multiplicand (xe2x88x921) or the negation of a left shift of the multiplicand (xe2x88x922). One such partial product is generated for each bit pair of the multiplier, these partial products are shifted appropriately and summed to generate the product. For modern array multipliers, the product is generated by an array of carry-save adders of structure similar to a Wallace tree.
In the Booth recoding circuit described above, a term Bnxe2x88x921 is used. In all bit-pairs except the least significant bit pair, this bit is the most significant multiplier bit of the bit pair of less significance than the bit pair being encoded. In the least significant bit pair, this bit is normally a zero. In multipliers that perform a multiply of high precision from a sequence of lower-precision multiplies, it is known that this bit may be used as a carry input to the higher digits of the multiply.
A multiplier has been constructed that performs a high precision multiply by performing a sequence of four lower-precision multiplies, each of the four generating a partial product, and summing the partial products. This multiplier breaks down the a multiply into a sequence of four multiply operations.
The multiplier performs three initial cycles to compute the partial products P1, P2, and P3.
The left-shift-by-12 bit positions required in the example to align the resulting 53 bits of product is accomplished by the left shift of each of the A and C operands 6 bit positions, this places the significant bits of P4 at the appropriate inputs of the adder used to sum the partial products. The top 22 bits of partial sums P3, P2, and the effect of P1, are positioned in the 22 least significant bits of the adder by shift operations.
The multiplier used for all four cycles of the multiply is Booth encoded. Each two-bit booth recoder cell requires three inputs, (Bn+1, Bn, Bnxe2x88x921), so that an add of three times the multiplicand can be represented as a subtract of one times the multiplicand with an add of four times the multiplicand. It may therefore be necessary to force an extra addition when the multiplier is split into fields as in this design.
To get the correct result using the booth encoded multiplier, two techniques have been used. In the first technique the booth recoding cell that receives the raw multiplier bits 32 and 33 during computation of P4 must also receive multiplier bit 31. This is the same booth recoding cell that receives multiplier bits 6 and 7 as {Bn, Bn+1} during computation of the partial product P4. A multiplexor has been found necessary such that the Bnxe2x88x921 input of this recoding cell may either be bit 5 of the multiplier or bit 31 of the multiplier; in this way the correct result is attained. Bit 31 of the multiplier may be obtained for this multiplexor directly from the multiplier register or by latching bit 31 during the preceding, P3, cycle.
In an alternative technique, an effectively 34-bit booth encoder is used during computation of P1, P2, and P3, having its two most significant input bits (Bn+1 and Bn) driven to zero. The Bnxe2x88x921 bit of the top encoder cell is the most significant bit (Bit 31) of the operand. The multiplier array has a partial stage corresponding to this top encoder cell, wherein any necessary correction term is generated and added.
It has also been found that each of the partial products P2, and P3 may be zero extended as they are summed. P1, however, must be conditionally extended with 0""s if there is an explicit carry out of the multiplier array, and extended with 1""s if there is no carry out. Then, prior to adding P4, the sum of P1, P2, and P3 is extended with one of four possible constants depending on the number of explicit carry out signals generated by the multiplier during computation of P1 and the summing adder during computation of P2 and P3. These constants are 03FDH, 03FEH, 03FFH, and 0000H, where 0FFDH is used if no explicit carry out signals occurred and 0000H is used if three carry out signals occurred.