1. Field of the Invention
The invention relates generally to computing systems which multiply signed and unsigned binary numbers. The invention relates more specifically to digital computers which perform multiplication using a modified Booth algorithm.
2. Cross Reference to Related Patents
The following U.S. patent(s) is/are assigned to the assignee of the present application, is/are related to the present application and its/their disclosures is/are incorporated herein by reference:
(A) U.S. Pat. No. 3,840,727 issued Oct. 8, 1974 to Amdahl et al, and entitled, BINARY MULTIPLICATION BY ADDITION WITH NON-OVERLAPPING MULTIPLIER RECODING; and
(B) U.S. Pat. No. 4,761,756 issued Aug. 2, 1988 to Lee et al, and entitled, SIGNED MULTIPLIER WITH THREE PORT ADDER AND AUTOMATIC ADJUSTMENT FOR SIGNED OPERANDS.
3. Description of the Related Art
Digital computers can multiply binary numbers using a process equivalent to that used in the long hand multiplication of decimal digits.
In such a process, each bit (single digit) of the multiplier is taken by itself and applied against all the bits (digits) of the multiplicand to produce a corresponding partial product. Next, each partial product is shifted according to the power of its multiplier bit. And finally, all the shifted partial products are summed to form a complete product.
The long hand method is generally disadvantageous in computer applications because an undesirably large amount of computer circuitry and/or time is typically required to carry out multiplications involving large numbers.
As the number of bits in the multiplier and multiplicand increase, the number of partial products increase. The size of each partial product also increases. Consequently, the total number of bits to be processed in the partial product summation step increases.
Because some amount of computer circuitry and time is required for processing each bit of each partial product, the total amount of processing time and/or the overall size of the computer circuitry used to carry out multiplications by the long hand method tends to become disadvantageously large.
A number of techniques have been developed in the past for reducing this disadvantageous trend.
The Booth algorithm is a well known example. It reduces the number of partial products generated during multiplication and thereby reduces the total number of partial product bits.
The operation and advantage of the Booth algorithm can be best understood by way of a simple example.
Consider the multiplication of the number 5 (multiplicand) by the number 7 (multiplier). This may be represented in binary form as 101.times.111. Using the long hand approach, one moves right to left across the multiplier bits, and produces the following sum of partial products: EQU (101.times.001)+(101.times.010)+(101.times.100).
It is seen that three partial products are to be generated and summed together to produce the answer.
This of itself is not difficult to do with present day computer technology. One merely needs to provide three memory areas (registers), each with a storage capacity for storing one of the partial products, and to provide a serial or parallel adding unit for summing the contents of the memory areas either serially over time or simultaneously, in parallel.
Consider what happens, however, as the number of bits in the multiplier and multiplicand progressively increase by powers of two. (Consider 101010.times.111111 as a second problem.) The length of each partial product increases by the same scaling factor and the number of partial products increases by the same scaling factor. The amount of computer circuitry and/or computer time required for carrying out the multiplication using long hand approach increases correspondingly.
The Booth algorithm reduces the total number of partial products by taking advantage of a mathematical property which occurs whenever repetitive strings of ones are found within the multiplier.
Each continuous string of binary ones (e.g., 111) is replaced by the next highest power of two, less one. By way of example, 111=1000-1. In decimal terms this is expressed as 7=8-1.
For the first given example (5.times.7), the final product is obtained by summing the positive partial product (101.times.1000) with the negative partial product (101.times.-1). The number of partial product additions is reduced (from 3 to 2 in the present example) and a savings in computation time or circuit size is realized.
Many variations to the Booth algorithm have been devised over the years.
One common variation is referred to as the "three-bit modified Booth algorithm". The number of partial products created by this method is approximately (L.sub.M +1)/2 where L.sub.M is the number of bits in the multiplier.
The method is summarized with reference to the below TABLES 1 and 2. A dummy zero is appended to the right of the least significant bit in the multiplier and an encoding window is defined for processing the dummy appended multiplier, 3 bits at a time.
The window starts with the rightmost triplet of bits (the dummy plus bits 0 and 1 of the multiplier) and shifts left two positions for each iteration. One bit is shared between two successive iterations, serving as the leftmost window bit in a first iteration and the rightmost window bit in a second iteration. The bit position at the center of the window is considered the active bit position.
For each iteration, the encoding scheme of the below Table-1 is applied. C represents the multiplicand. Each output of the encoding scheme (C.times.m, where m=-2,-1,0,+1 or +2) is deposited in a successively lower row of a summation array with the rightmost bit of the output located in the active bit position of the encoding window.
For negative outputs, a ones complement of C is formed and a "hot carry" bit ("1") is added to an appropriate bit position of a next lower row of the summation array to thereby effectively form a two's complement of C. If the output is C.times.-2, the one's complement of C is also shifted left one position within its array row.
TABLE-1 ______________________________________ WINDOW INPUTS OUTPUT ______________________________________ 0 0 0 C .times. 0 0 0 1 C .times. 1 0 1 0 C .times. 1 0 1 1 C .times. 2 1 0 0 C .times. -2 1 0 1 C .times. -1 1 1 0 C .times. -1 1 1 1 C .times. 0 ______________________________________
Table-2 shows an example in which the decimal problem 5.times.12 is carried out by the 3-bit modified Booth algorithm. In Table-2, the binary form of the multiplicand (5), the multiplier (12) and part of the resulting summation array are shown in top to bottom order with bit position numbers 0 to 12 (C.sub.h in hexadecimal notation) and so forth being aligned vertically on top.
TABLE-2 __________________________________________________________________________ bit position: . . . C B A 9 8 7 6 5 4 3 2 1 0 multiplicand: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 multiplier: x 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 (o) C .times. 0 .fwdarw. . . . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pp1 C .times. -1 .fwdarw. . . 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 pp2 C .times. +1 .fwdarw. 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 (1) pp3 . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pp4 . . . . . . . . . . . . . . . . . . . . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 sum __________________________________________________________________________
The multiplicand and multiplier fields are each 16 bits long here. Although not shown, it is to be understood that the length of the sum field (L.sub.S) is equal to that of the multiplier (L.sub.M) plus that of the multiplicand (L.sub.C). So, L.sub.S =32 bits in the illustrated case. The sum field extends from the rightmost bit position 0 to the leftmost bit position 31 (1F.sub.h in hexadecimal notation).
The dummy zero is shown to the right of bit position 0 in the multiplier row. Partial product rows are respectively labelled pp1, pp2, pp3, etc
Let N represent the number of partial products. N is approximately equal to (L.sub.M 1)/2 which, in the above example becomes (16+1)/2=9.
If all the partial product rows, pp1, pp2, . . . , ppN had been written out in the above Table-2, it would be seen that the shape of the summation array turns out to be roughly trapezoidal (if one ignores the details of a staircase border at the right side of the array).
The top and bottom sides of the trapezoidal-shaped array are parallel to one another. The right side of the trapezoidal shape slopes to the left as one moves in the top to bottom direction. The left side of the trapezoidal shape extends vertically.
More specifically, the right side of each successive partial-product row (pp1, pp2, pp3, etc.) aligns two positions to the left of a previous row, thereby creating a leftward sloping staircase border at the right side of the trapezoidal shape. The left side of each successive partial-product row (pp1, pp2, pp3, etc.) extends to and aligns vertically with the leftmost bit position of the total sum field (position 31) because sign-extension bits have to be provided in each row to take care of carry bits, as will be explained in more detail shortly.
The top side of the trapezoid is equal in length to that of the sum field (L.sub.S) located just below the base of the trapezoid. The base of the trapezoid is roughly half as long as the sum field.
The height of the trapezoidal shape is equal to N, which is the total number of rows (pp1, pp2, . . . , ppN) in the summation array. As explained earlier, N is approximately equal to one-half the number of bits in the multiplier plus one divided by two (N.perspectiveto.[L.sub.M +1]/2). N can vary slightly around this norm depending on the modulo-3 value of the multiplier bit length, L.sub.M.
For L.sub.M =L.sub.C =L.sub.S /2, the total number of bits in the created when the encoding window covers the dummy zero (c) plus multiplier bit positions 0 and 1. The value of the encoding output, C.times.0 is written across partial product row pp1 with the least significant bit aligned most significant bit of the C.times.0 encoding output is written into bit position 15 (F.sub.h in hex notation) of row pp1. However, because the sum line is 32 bits long, a string of sixteen "sign-extension" bits have to be further written into bit positions 16-31 of row pp1.
The encoding output written into the next lower row, pp2, is C.times.-1 since the window covers multiplier bits 3, 2 and 1 (input pattern 110) for that row. To form a negative version of C, the ones complement of C is written across the pp2 row with the hot carry set in bit position 2 of next lower row pp3 as indicated by the symbol "(1)".
Note that the value, C.times.-1 can be written across just sixteen bit positions (2 through 17) of row pp2, but sign extension bits have to be filled in across the of row pp2, from bit position 18 through bit position 31.
Row pp3 is further filled with C.times.1 as the 3-bit encoding window covers bit positions 5, 4 and 3 (input pattern 001).
For remaining operations of the encoding window on the more leftward parts of the multiplier (00. . . 01100), the output is always zero as illustrated in the fourth row, pp4, and indicated to continue into the remaining rows (pp5, pp6,. . . , ppN) below it.
It is to be noted that although the length L.sub.C of the multiplicand, C, is only 16 bits, and its corresponding partial product C.times.m is therefore only 16 or 17 bits long (depending on whether m is a +/-2 or not), the sign extension bits have to be replicated from the leftmost position of each partial product C.times.m (where m=-2, -1,0,1, or 2), to the leftmost bit position of the sum field (bit position 31) to assure that proper summation takes place.
The sign extension bits are all ones ("111. . . ") if the C.times.m output is negative and all zeroes ("000. . . ") if the C.times.m output is positive.
A variety of software and/or hardware techniques may be used for writing all the sign extension bits into the sign-extension bit positions of each row, pp1, pp2, . . . , ppN. A variety of software or hardware techniques may also be used for carrying out the partial product generation and summation operations of the modified Booth algorithm. Partial product summation can be carried out either as one massively parallel operation or as a few less massively-parallel operations or as a sequential series of smaller operations whose final effect is to produce the sum field (the complete product).
Regardless of the technique chosen (software versus hardware, parallel versus serial), a common problem develops as one attempts to scale upwardly from 16-bit by 16-bit multiplication operations, to 32.times.32 bit operations, to 64.times.64 operations, and so forth.
The amount of circuitry and/or time needed for writing the sign-extension bits into each row grows with scaling. The overall size of the summation array (pp1, . . . , ppN) grows in both height and width at a rate proportional to the scaling factor. The cost for preparing the sign extension bits and processing the numbers within the summation array, as measured in terms of either hardware resources (e.g., number of logic gates) or time for completion, grows in proportion to the area of the summation array.
Since area is a function of height times width, and these parameters respectively grow in rough proportion to the number of bits in the multiplier, costs increase as the square of the multiplier field length.
More specifically, for the case where there are L.sub.M bits in each of the multiplier and multiplicand, the area of the trapezoidal-shaped summation array is approximately 0.75 times (L.sub.M.sup.2 +L.sub.M) and the cost for implementing such an array is proportional to this scaling factor.
Despite the speed and circuit size savings obtained from use of the 3-bit modified Booth algorithm, hardware and/or software costs nonetheless become prohibitively large as one tries to construct multipliers with larger and larger multiplier fields. A method for further reducing costs is needed.