Many digital electronic devices need to perform various arithmetic functions involving multiplication. Such hardware multipliers are an indispensable component of computer systems such as audio and video systems, simulators, computer games, speech and pattern recognition systems and image processing systems. The overall speed of such systems depends heavily on the speed of the internal hardware multipliers.
A multiplication of two numbers P and Q can be performed using a digital electronic device by calculating and summing a series of partial products. The numbers P and Q are commonly represented as twos complement binary digits, because this makes it straightforward to represent negative numbers.
In many applications it is necessary for a digital circuit to support a variety of data formats, for example, 32-bit data and 64-bit data. It is thus desirable to have an efficient algorithm for supporting such a variety of formats in a digital electronic circuit.
A variety of data formats may be supported by enabling the circuit to perform a process known as partitioned parallel multiplication. In a circuit that is designed to input two N-bit numbers and multiply them together, partitioned parallel multiplication involves splitting each N-bit input into two or more parts, and performing simultaneous independent multiplications of the first part of the first number times the first part of the second number, and the second part of the first number times the second part of the second number, etc. It is possible to use the same multiplication circuitry as for the N-bit×N-bit multiplication, except that all cross terms involving multiplications which do not relate to the same parts of both of the inputs, e.g. the first part of the first number times the second part of the second number, must be forced to zero to give the correct answer.
A circuit can be configured to allow partitioned parallel multiplication of input numbers in a single partitioned size or in one of a selection of different partitioned sizes, as well as allowing multiplication of input numbers of the full, non-partitioned size. For example, one circuit can be configured to give a choice between one 128 bit×128 bit multiplication, or in partitioned mode, two 64 bit×64 bit multiplications. Another circuit can be configured to give a choice between one 128 bit×128 bit multiplication, or in partitioned mode, a choice of two 64 bit×64 bit multiplications or four 32 bit×32 bit multiplications.
A general method of hardware multiplication is shown in the flowchart of FIG. 1. Two numbers P and Q, which are to be multiplied together, are input to an array generator at step S100. The array generator calculates an array of partial products, which is passed to an array reduction stage at step S101. In the array reduction stage, the partial products are added together to produce a sum of partial products and appropriate carry bits. At step S102, the carry bits are added to the sum of partial products, outputting the final product of the multiplication of P×Q. A network of carry save adders, such as a Wallace tree, may be used to add the partial products together.
FIG. 2 is a schematic diagram of an array of partial products generated when P and Q are partitioned into two. FIG. 3 is a schematic diagram of an array of partial products generated when P and Q are partitioned into four. As can be seen, only the diagonal partial products in the array are required and the off-diagonal or cross terms are set to zero.
Although it is possible to find the final product of P×Q by calculating a partial product for each binary digit of Q, and adding these partial products together, this can involve a very large number of partial products. It is time consuming to calculate and sum these partial products.
Booth encoding is a known technique that is used to reduce the number of partial products to be calculated and summed. In Booth encoding, the number Q is transformed into a format which is easier to multiply. Q is split into overlapping groups of adjacent bits, otherwise known as “multiplier groups”, and each multiplier group is transformed into a format which can be used directly in a multiplexer circuit to select an appropriate partial product.
Two bit Booth Encoding
In two-bit Booth encoding (which is also known as radix-2 Booth encoding), Q is encoded in overlapping groups of three adjacent bits, where two bits in each group of three bits are shared with neighbouring groups, and the middle bit of each group is unique to that group. For example, for a 64 bit number Q[0:63], the first multiplier group is Q[1], Q[0], 0; the second is Q[3], Q[2], Q[1]; the third is Q[5], Q[4], Q[3], etc, and the last is Q[63], Q[62], Q[61], giving a total of 32 multiplier groups. The Booth encoding involves transforming the three bits of each multiplier group into a form which is easier to multiply. The partial product is calculated for each of the multiplier groups, and the partial products are then summed.
Without Booth encoding, a partial product would be obtained for each bit of Q (e.g. 64 partial products for a 64 bit Q). Each partial product is either zero or one times P, where the multiplication is P×Q. However, in 2-bit Booth encoding, only half as many partial products are obtained, because we are encoding in groups of three bits of which two are unique (and one is re-used from another group), i.e., the depth of the array illustrated in FIGS. 2 and 3 is halved by compressing the partial products into more complex partial products. The trade off for obtaining only half the number of partial products is that each partial product is more complex. In two bit Booth encoding, each partial product involves a multiply by −2, −1, 0, 1 or 2 operation, rather than being limited to a multiply by 0 or 1 operation.
A Booth encoder may be configured with one output corresponding each partial product which may be selected. The appropriate output is set to high when that particular partial product is selected, and is set to low otherwise. These outputs are then passed to the Booth multiplexer to obtain the partial product. For example, in two-bit Booth encoding where the possible partial products are multiplications by 2, 1, 0, −1 or −2, these may be represented by outputs of 2M, M, 0, −M, and −2M respectively.
Alternatively, a Booth multiplexer can be controlled using fewer outputs than one per partial product. For example, outputs from the Booth encoder may be in the form of M, 2M and S, where M represents multiplication by 1 or −1; 2M represents multiplication by 2 or −2; and S is a sign bit. For multiplication by zero, both M and 2M are set to zero. A first embodiment of the present invention is particularly suited to a Booth encoder with three outputs, A, S and X2 , which are defined below, and which differ slightly from M, 2M and S. However, the present invention is not limited to a particular set of Booth encoder outputs, and may be used with any set of outputs which is appropriate.
FIG. 4 shows a hardware implementation, using logic gates, of a two-bit Booth encoder 110 (BENC) and a Booth multiplexer 120. The Booth encoder has inputs M2 , M1 and M0, and outputs X2, A and S. The three inputs are for the three bits making up a single multiplier group. The outputs are defined as follows:X2=(M0⊕M1)cA=M2+(M0+M1)cS=M2c+M0●M1
where + signifies an OR operation, ● signifies and AND operation, ⊕ signifies an XOR operation, and c signifies a NOT operation.
X2 is the output of XNOR gate 115, which has inputs M0 and M1. A is the output of OR gate 114, which has two inputs: M2 and the output of NOR gate 113. NOR gate 113 has inputs M0 and M1. S is the output of NAND gate 112, which has inputs of M2 and the output of NAND gate 111. NAND gate 111 has inputs M0 and M1.
The purpose of the outputs is to control a multiplexer to select a correct partial product for each multiplier group. For example, for a multiplication of P×Q, the multiplier Q is divided into multiplier groups and each group is inputted into the M0, M1, M2 inputs of a Booth encoder. The outputs of the Booth encoder are then used to select between partial products of P (“multiply by one”), 2P (“multiply by two”), −P (“multiply by minus one”), −2P (“multiply by minus two”) and zero (“multiply by zero”).
The X2 output acts as an indicator of whether the selected partial product involves multiplication by an odd or even number. For those values of M2 , M1, M0 that result in a “multiply by 2” or “multiply by −2” partial product, X2 will be 1. For those values of M2 , M1, M0 that result in a “multiply by 1” or “multiply by −1” partial product, X2 will be 0.
The S output acts as an indicator of the sign of the partial product. For a partial product selection of “multiply by 0”, “multiply by 1” or “multiply by 2”, then S is set to 1. However, for a partial product selection of “multiply by −1” or “multiply by −2”, then S is 0.
The A output acts together with S as an indicator of whether the selected partial product is “multiply by 0”. If the partial product selected is “multiply by 0”, then A and S are both 1. However, if the partial product selected in not “multiply by 0” (or “multiply by −0”), then A is the compliment of S, i.e. when S=1, then A=0, and when S=0, then A=1.
The Booth multiplexer 120 has five inputs and one output. Three of the five inputs are X2 , A and S, which are the outputs of the Booth encoder 110. The other two inputs N0 and N1 are two consecutive bits of the number to be multiplied. In a multiplication of P x Q, where Q is input to the Booth encoder, the inputs N0 and N1 of the Booth multiplexer are two consecutive bits P[k], P[k−1] of the number P.
The output of the Booth multiplexer is a single bit PP of the partial product. Logically,PP={X2●[(N0●Ac)+(N0c●Sc)]+X2c●[(N1●Ac)+(N1c●Sc)]}
To calculate all of the bits of a partial product, overlapping pairs of bits of P can be fed sequentially into the Booth multiplexer 120, and the output bit PP for each can be stored. Alternatively, a plurality of Booth multiplexers can be provided for each Booth encoder, preferably one for each bit of the partial product to be obtained. All of the bits of the partial product may then be calculated in parallel, which is much faster than calculating them sequentially.
As shown in FIG. 4, the first stage of the Booth multiplexer 120 is a 2:1 multiplexer 121 with inputs N0 and N1. The address line is X2, and the output is Z, which is an intermediate output and is used as an input to the second stage of the Booth multiplexer 120. If X2 is 0 (i.e. multiplication by 1 or −1), the multiplexer 121 selects and outputs the N1 bit. If X2 is 1 (i.e. multiplication by an even number), the multiplexer 121 selects and outputs the N0 bit. Thus when a multiplication by 2 or −2 is to be performed, the first multiplexer stage 121 of the Booth multiplexer 120 performs a bit shift, which is the simplest way of performing a two times multiplication in binary. When multiplication by 1 or −1 is to be performed, the multiplexer 121 selects the N1 bit, and no such bit shift is performed.
The second stage of the Booth multiplexer 120 is a 2:1 multiplexer 122 with inputs A and S, and the output Z from the first stage 121 as an address line. The value of A is selected and inverted by the inverter 123 when Z=1 and the value of S is selected and inverted by the inverter 123 when Z=0. If multiplication by zero is required, then both A and S are set to 1, so the output PP will be 0, regardless of the value of Z.
If multiplication by 1, 2, −1 or −2 is required, then A and S will have complementary values, i.e. for multiplication by 1 or 2, A=0 and S=1, and for multiplication by −1 or −2, A=1 and S=0. Thus, for a positive multiplication, the selected value A or S is equal to the value of Z, and for a negative multiplication, the selected value A or S is equal to NOT Z. Thus the output PP corresponds to multiplication by the correct sign. For negative numbers, the output from the Booth multiplexer is the complement of the corresponding positive number. To convert this to twos complement format, it is necessary to add one to the complete partial product.
In addition, for negative partial products, the most significant bits should be sign extended to allow correct array reduction, due to the 2s compliment format of the binary numbers.
FIG. 5 is a table of inputs and outputs of a two-bit Booth encoder. A list of all possible combinations of the input values, M2 , M1 and M0, is shown in the first three columns on the left hand side of the table. For a multiplication of P×Q, the bits in these first three columns of the table will correspond to the possible bit combinations of each multiplier group of Q. The next column of the table describes, for each combination of input values M2 , M1, M0, the action which must be performed on the number P to obtain the correct partial product. The next three columns of the table show the values of the outputs A, S and X2 of the Booth encoder 110. The last column of the table shows the values of the single bit partial product output PP of the Booth multiplexer 120, when the input to the multiplexer 120 includes the N0, N1 bits of P.
Each Booth encoder 110 is used to calculate a single partial product. To calculate the total product of P×Q, a plurality of Booth encoders may be provided, with one for every multiplier group of Q.
Three Bit Booth Encoding
In three-bit Booth encoding (which is also known as radix-3 Booth encoding), a number Q is encoded in overlapping groups of four bits, where two of the four bits are shared with adjacent groups, and the other two of the four bits are unique to one group. This reduces the number of partial products by a factor of three.
However, this reduction in number of partial products comes at a price, because in practice it is not straight forward to calculate the “multiply by 3” partial product.
Negative partial products can easily be calculated by inverting the values of the bits and adding 1. “Multiply by 2” and “multiply by 4” partial products can easily be calculated by performing a bit shift. However, the “multiply by 3” partial product cannot be calculated by such straight forward methods, and can only be calculated directly by using a carry propagate adder arrangement. The carry propagate adder increases the latency, due to the long wires that are required for propagating carries from less significant to more significant bits, hence also increasing the time needed to perform a calculation.
A known solution to this problem is to use a modified form of 3-bit Booth encoding, such as fully redundant 3-bit Booth encoding or partially redundant 3-bit Booth encoding.
FIG. 6 illustrates the methods of redundant and partially redundant 3-bit Booth encoding. Firstly, the fully redundant form of a 3N (i.e. a “multiply by 3”) partial product of a 16 bit number is shown. In the fully redundant form, the 3N partial product consists of a sum of two partial products which are easier to calculate, i.e. a sum of the N (i.e. “multiply by 1”) and 2N (i.e. “multiply by 2”) partial products. The 2N partial product can easily be found by performing a bit shift, and the N partial product requires no transformation. FIG. 6 shows a representation of each bit as a dot, and the fully redundant form is represented by two rows of 16 dots, where one row is shifted one place with respect to the other row. The complete partial product is represented by the sum of the two rows.
Of course, the trade-off to avoiding direct calculation of the 3N partial product is that there are now twice as many partial products to be summed. A compromise is provided by using partially redundant Booth encoding. The partially redundant form is obtained from the fully redundant form by using a series of small length adders to sum the 2N and N values, but keeping the carrier bits between adders, rather than propagating these carrier bits from one adder to the next.
In FIG. 6, the small length adders are represented by boxes surrounding groups of dots to be added. The output of the adders is represented as a row of 16 dots, and the carrier bits between adders is represented as a further 3 dots which are positioned in the proper columns, to be added to the output of the adders. In this example, 4-bit adders are used. The adders are small to avoid carry propagation over large distances, because this would introduce a significant time delay. However, the use of the adders reduces the number of bits needing to be summed. In the example shown, the adders reduce the number of bits to be summed from 32 to 19 (which is one 16 bit number plus 3 carry bits).
However, this partially redundant representation does not take the same form for both positive and negative multiples. Thus, difficulties can arise when dealing with negative numbers. The problem can be solved using biasing. Each partial product has a bias constant added to it before being summed to form the final product. The bias constant is the same for both positive and negative multiples, but it may be different for different partial products. As shown in the figure, a biasing constant which effectively adds 1 to each carry bit is chosen, i.e. the bias constant is 1000100010000. Each carry bit, plus corresponding bias constant bit, plus corresponding bit of the 16 bit adder output is then summed, and the result can be represented in biased partially redundant form as a row of 16 bits plus three correctly positioned carry bits. In FIG. 6, these sums of three bits are represented by a ring around each set of three bits to be summed.
As shown in FIG. 6, the inverse of the biased partially redundant form can easily be represented, by complementing all of the non-blank bits of the partially biased redundant form, and adding 1. This is the same procedure which is used to obtain the negative of a number in its non-redundant form.
FIG. 7 shows a circuit for three-bit Booth encoding. The circuit has a Booth encoder 210 and a Booth multiplexer 220. The Booth encoder 210 has inputs M0, M1, M2 and M3 and outputs A, S, X01, X12 and X23. The four inputs are for the four bits making up a single multiplier group. The outputs are defined as follows:X01=M0⊕M1X12=M1⊕M2X23=M2⊕M3A=M3+(M0+M1+M2)cS=M3c+(M0●M1●M2)
Output X01 is the output of an XOR gate 217, which has inputs M0 and M1. Output X12 is the output of an XOR gate 216, which has inputs M1 and M2. Output X23 is the output of an XOR gate 215, which has inputs M2 and M3. Output A is the output of OR gate 214, which has inputs of M3 and the output of XOR gate 213. XOR gate 213 has inputs M0, M1 and M2. Output S is the output of NAND gate 212, which has inputs of M3 and the output of NAND gate 211. NAND gate 211 has M0, M1 and M2 as inputs.
The outputs X01, X12 and X23 indicate what magnitude of multiple is needed to calculate the partial product. X01 indicates whether a bit shift is needed at all (i.e. a 2N or 4N partial product is selected), or whether a selection is to be made between the N and 3N partial products. X23 selects between the N and 3N partial products when X01=1, and is ignored when X01=0. X12 selects between the 2N and 4N partial products when X01=0, and is ignored when X01=1. The outputs S and A have a similar purpose to that in the two-bit Booth encoder, i.e. indicating the sign of the partial product, and whether the partial product is zero.
The Booth multiplexer 220 has nine inputs and one output. Five of the nine inputs are X01, X12, X23, A and S, which are the outputs of the Booth encoder 210. Three of the other four inputs, N0, N1 and N2, are three adjacent bits of the number to be multiplied. The last input, Nx, is for a “multiply by 3” type of partial product.
The first stage of the Booth multiplexer 220 comprises a 4:1 multiplexer circuit 221. The inputs are N0, N1, Nx and N2, and the output is Z, which is an intermediate output and is used as an input to the second stage of the Booth multiplexer 220. For a multiplication by 1, no bit shift is needed, and N2 is selected and output. For a multiplication by 2, a single bit shift is needed, and N1 is selected and output. For a multiplication by 4, a double bit shift is needed and N0 is selected and output. For a multiplication by 3, Nx is selected and output.
The second stage of the Booth multiplexer 220 comprises a 2:1 multiplexer circuit 222. This operates in a similar way to that described for the two-bit Booth encoding, where the output bit PP is either complemented or not, depending on the sign of the partial product, and is set to 0 if the partial product is zero. The multiplexers 221 and 222 and the inverter 223 operate in a similar manner to the multiplexers 121 and 122 and the inverter 123 in the embodiment of FIG. 4.
FIG. 8 is a table of inputs and outputs for three bit Booth encoding. The first four columns show a list of all possible values of the inputs M3, M2, M1, M0. The next column shows the action to be performed on the number to be multiplied, in order to obtain the partial product, e.g. multiply by 0. The next column shows the value of the single bit partial product output PP of the Booth multiplexer 220, when the inputs to the multiplexer 220 includes the three consecutive bits N2, N1, N0 of the number to be multiplied, and the multiply by three bit Nx. The last five columns show the values of the outputs A, S, X01, X12 and X23 of the Booth encoder 210.
As discussed above, in a circuit capable of carrying out partitioned parallel multiplication, it is necessary to be able to force all cross products bits to zero (as illustrated in FIGS. 2 and 3), to prevent one set of numbers to be multiplied together from affecting the result of the other multiplication(s). FIGS. 9a, 9b and 9c show prior art schemes for forcing these cross product terms to zero during parallel partitioned multiplication. In each case, a control signal T is used to select between normal multiplication and partitioned parallel multiplication.
FIG. 9a shows the scheme adopted by U.S. Pat. No. 5,943,250, U.S. Pat. No. 6,035,318 and U.S. Pat. No. 6,223,198. These three patents disclose a process for Booth encoding and Booth multiplexing in the usual way, to obtain a sequence of partial product bits. Once the partial product bit have been obtained from the Booth multiplexer, all bits corresponding to cross product terms (e.g. a first part of the first number times a second part of the second number, or a second part of the first number times a first part of the second number) are passed through an AND gate with a NOT T control signal on the other input. The AND gate turns all of the cross product bits low when T is high. Therefore the final output gives the product of the two (or more) independent multiplications. However, this scheme has the considerable disadvantage that it is necessary to add an extra AND gate to every Booth multiplexer. This is very area expensive, and it also introduces extra time delay by adding extra logic to the critical path.
FIGS. 9b and 9c show an alternative prior art scheme, which is described in U.S. Pat. No. 6,353,843. As shown in FIG. 9b, two Booth encoders 310, 311 are provided in parallel with each other. One has a set of inputs appropriate to T=0, and the other has a set of inputs equal to T=1. A multiplexer 312 is then used to select between the two Booth encoders 310, 311, using T as the address bit to make the selection. Again, this solution involves adding logic to the critical path, which adds a time delay. A further problem is that if more than one partitioning is to be enabled, then it is necessary to use extra Booth encoders. FIG. 9c shows an example in which the input number may be partitioned two parts or four parts, or may remain unpartitioned. Separate Booth encoders 320, 321, 322 are provided for each of these partitioning options. Also, a 3:1 multiplexer 323 is needed, rather than a simpler 2:1 multiplexer. This all adds to the cost and complexity of the circuit.
The prior art firstly determines the cross product multiples, and then forces them to zero after the Booth encoding.