U.S. Pat. No. 5,457,804 discloses a multiplication circuit in which multiplication time for multiplications with doubled precision is shortened, whereby either a higher-value half or a lower-value half of an input word of the multiplication circuit is selectively supplied to a standard Booth decoder via a multiplexer.
There are many algorithms for the accelerated calculation of a multiplication. In the classical Booth's algorithm, two bits of the multiplier are always simultaneously viewed, and a factor is formed therefrom by which the multiplicand is multiplied. This product is respectively added to the partial product. This method works with n uniform shift operations by a bit position to the right (division by 2) and n/2 additions, where n is the word length of the multiplier. For two's complement representations, no sign corrections are required.
A faster modified method has been proposed in (O. L. MacSorley, High-speed arithmetic in binary computers, Proc. IRE, vol. 49, pp. 67-91, 1961; L. P. Rubinfield, A proof of the modified Booth's algorithm for multiplication, IEEE Trans. Computers, pp. 1014-1015, October 1975). In this method, three bits of the multiplier are viewed simultaneously, which are decoded into 5 different factors (0; +-1; +-2; see Table 1). This algorithm requires only n/2 shift operations by two bit positions to the right (division by 4) and n/2 additions. In both methods mentioned, the decoding can be begun either at the LSB (least significant bit) or at the MSB (most significant bit). A special treatment of the sign is not required. The decoding beginning with the LSB has the advantage (A. Stotzle, A. Rainer and W. Ulbrich, Parallel-serial multiplication using Booth's algorithm and Horner scheme, Proc. ISCAS '85, pp. 1389-1390, May 1985) that via the use of the Horner scheme the word length of the product is limited to the word length of the multiplicand. The result thus obtained agrees with the first m bit of the exactly n+n bit long product. In the following, the modified Booth's algorithm is specified somewhat more precisely, since it is the point of departure for the invention.
Let the multiplier Y and the multiplicand X be given by their two's complement representation. ##EQU1## with y.sub.i .epsilon. {0,1} for i=0, 1, 2, . . . , n-1 and x.sub.j .epsilon. {0, 1}, for j=0, 1, 2, . . . , m-1
By means of the combination of two successive summation steps, the multiplier Y can be transformed into ##EQU2## for i=1, 3, 5, . . . , n-1 and y.sub.n =0.
The product of the multiplication P thus results with the multiplicand X, as ##EQU3## for I=1, 3, 5, . . . , n-1, with y.sub.n =0 and with ##EQU4## for i=1, 3, 5, . . . , n-1
FIG. 1 shows the processing in principle of the multiplier X with the modified Booth decoder. Beginning at the LSB side, three bits are always decoded simultaneously, and subsequently the decoding window is shifted two bit positions to the left. The bit y.sub.n is set to 0, and if the word length n is odd, an additional bit is produced by the doubling of the multiplier MSB. The corresponding decoder table can be seen in Table 1. This does not require further explanation, since it results from (A. Stotzle, A. Rainer and W. Ulbrich, Parallel-serial multiplication using Booth's algorithm and scheme, Proc. ISCAS '85, pp. 1389-1390, May 1985).
TABLE 1 ______________________________________ Multiplier bits (Y) Y.sub.i - 1 Y.sub.i Y.sub.i +1 Operation ______________________________________ 0 0 0 PP.sub.i .rarw. 0.25PP.sub.i + 2 0 0 1 PP.sub.i .rarw. 0.25PP.sub.i + 2 + X 0 1 0 PP.sub.i .rarw. 0.25PP.sub.i + 2 + X 0 1 1 PP.sub.i .rarw. 0.25PP.sub.i + 2 + 2X 1 0 0 PP.sub.i .rarw. 0.25PP.sub.i + 2 - 2X 1 0 1 PP.sub.i .rarw. 0.25PP.sub.i + 2 - X 1 1 0 PP.sub.i .rarw. 0.25PP.sub.i + 2 - X 1 1 1 PP.sub.i .rarw. 0.25PP.sub.i ______________________________________ + 2
Corresponding to the coding prescriptions of Table 1, control signals are produced, namely the factor signal C, the shift signal K.sub.0 and the sign signal S.sub.i. The control signals control the concrete execution of the individual cycles of the multiplication. FIG. 2 shows a schematic diagram of a standard Booth multiplier architecture. At the beginning of a multiplication, the multiplicand X is first loaded into the register REGX, and the accumulation register ACCU is set to 0 and initialized. Corresponding to the value of the multiplier Y and to the corresponding result of the Booth decoding according to Table 1, the individual partial products are produced in n/2 steps according to Equation 6. For this purpose, the multiplicand X must be multiplied by 0, 1 or 2 in each iteration. The shift unit SHE1 in FIG. 2 serves for this purpose. The multiplication by the factor of 2 means that an operation of shifting the multiplicand to the left by one bit position must be carried out. The partial product PP is calculated in each iteration using a shift unit SHE2 that divides the partial product of the preceding iteration by 4 (shift to the right by two bit positions). After n/2 iterations, the product is available in the accumulation register ACCU. Care is to be taken that the multiplicand X must have a doubled MSB, so that no overflow can occur during the execution of the additions or subtractions. Using the multiplier according to FIG. 2, the equations indicated in Table 1 under the heading `Operation` are thus realized. The factor signal C is used to let the multiplicand X through to the shift unit SHE1 or to set 0. An operation of this sort is required according to the first and the last lines of Table 1. A multiplication by 2 is required for the multiplicand X according to Table 1, in the 4.sup.th and 5.sup.th lines. The addition or, respectively, subtraction ensues in an arithmetic logic unit ALU, dependent on the sign signal S.sub.i. From the last column of Table 1, it can be seen when the multiplicand X must be added to or subtracted from the preceding partial product PP.sub.i+2. The partial product PP.sub.i+2 is in the accumulation register ACCU and is multiplied by 0.25 in the shift unit SHE2, and is then supplied again for the formation of the next partial product PP.sub.i of the ALU. The allocation of the multiplier bits y.sub.i-1, y.sub.i, y.sub.i+1 to the individual operations can be found in Table 1.