The arithmetic unit is one of the most important components of any integrated electronic data processing system. A high precision multiplier is a fundamental part of the arithmetic unit as multiplication is one of the most frequently performed arithmetic operations. Multipliers are large and relatively slow blocks in most processors, and for this reason their implementation has a critical impact on cycle time and processor area.
Without limiting the scope of the invention this background is provided on the terminology associated with multipliers for arithmetic units employing what is typically termed higher radix Booth recodings of the multiplier.
For a bit-string a=a[p−1:0]=(a[p−1], . . . , a[0])=(ap−1, . . . , a0)ε{0,1}p we denote by
      〈    a    〉    =            ∑              i        =        0                    p        -        1              ⁢                  a        ⁡                  [          i          ]                    ·              2        i            the binary number represented by a.
A p′×p-multiplier is a circuit with p′ inputs a=a[p′−1:0], p inputs b=a[p−1:0], and p′+p outputs c=c[p′+p−1:0], such that <a>·<b>=<c> holds.
For binary multiplication, in general, binary p′×p-multipliers are implemented with a first step of generating the product in a carry-save representation and a second step of compressing the carry-save representation to a binary representation of the product. We focus on implementation of the first step. In the simplest form the product computation is based on the sum:
                              〈          c          〉                =                              ∑                          i              =              0                                      p              -              1                                ⁢                                    〈              a              〉                        ·                          b              ⁡                              [                i                ]                                      ·                          2              i                                                          (        1        )            with the partial products: Si=<a>·b[i]·2i. These partial products have to be generated and compressed to a carry-save representation. The generation of the partial products corresponding to (1) simply consists of logical AND-gates. Except for optimizing the logical and physical implementation of the partial product reduction, the main approach to decrease the delay and size of the partial product reduction is to decrease the number of partial products in (1) by representing one of the operands in a higher radix.
For higher radix partial product generation, let a p-digit string in radix β denote the radix polynomial(dp−1,dp−2, . . . ,d0)β≡dp−1βp−1+dp−2βp−2+ . . . +d0εP[β,D]. Here, β≧2 is the radix, D is the digit set with diεD for 0≦i≦p−1, and the radix system P[β,D] denotes the set of all radix polynomials with radix β and digits from D.
For the product A·B of the p-bit integer multiplicand A=<a> and the p-bit integer multiplicand B=<b>, let the multiplier be represented by a
      p    ′    =      ⌈                  p        +        1            k        ⌉  digit polynomial in the balanced minimally redundant (Booth digit) system P└2k,{−2k−1,−2k−1+1, . . . , 2k−1}┘. Each term of the product
      A    ·    B    =            ∑              i        =        0                              p          ′                -        1              ⁢          A      ·              d        i            ·              2        ki            is a higher radix partial product of the formA·di·2ki=−(1)S·2e·(jA),jε{1,3, . . . ,2k−1−1}.
TABLE 1Complexity of partial product generation for various Booth recodings(p = 64).Primitive#FanoutTotalPart. ProdsPartialPPGPrimitivePPGRadix(jA)ProductsFaninPP'sFanin216416464413313333822222244164174176832813813124
This allows each higher radix partial product to be created from a set of 2k−2 primitive partial products (jA) by a conditional shift and/or complement. Five metrics are provided in Table 1 for comparing the consequences of employing higher radix Booth recodings on a 64-bit operand. The number of primitive partial products that must be computed and routed to each partial product generator (PPG) grows linearly with the base β while the number of partial products that must be driven to each PPG decreases inversely with lg β. A measure of multiplier circuit routing complexity is the total PPG-fanin given by the number of primitive partial products that must be routed into each PPG summed over all PPGs. The necessity of routing each primitive partial product (jA) to each of the PPGs causes the total PPG-fanin to grow for β≧4.
Radix-4 has a clear advantage in these metrics over the host radix-2. The reduction by one half in the number of partial products and total PPG-fanin is obtained simply by the facility of the PPG units to conditionally complement and/or perform a one bit shift.
Moving to radix-8 further reduces the number of partial products by a third while adding the complexity and delay of a 2-1 add to precompute an additional primitive partial product (3A). This tradeoff is more acceptable for higher precisions in terms of adder complexity, but the routing of the two primitive values to each PPG increases the total PPG fanin.
Moving to radices 16 and 32 only marginally reduces the number of partial products while greatly increasing the partial product complexity both in number of primitive partial products and total fanin to the PPGs.
With this background we then note that prior art implementations of multipliers generally employ either radix-4 or radix-8, and it is helpful to focus on the features and disadvantages of these systems. FIG. 1A provides a block diagram of a prior art Booth radix-4 multiplier and FIG. 1B provides a block diagram of a prior art Booth radix-8 multiplier for comparing features.
Implementation of a (p′×p)-bit multiplier involves the accumulation of p partial products of the p′-bit multiplicand. Booth recoding of the multiplier to radix 4 with digits {−2,−1,0,1,2} reduces the number of partial products to
      ⌈                  p        +        1            2        ⌉    ,where each partial product is obtained by shifting and/or complementing the multiplicand for each non zero digit. Radix 4 recoding thus reduces the multiplier size by about 50% and provides more flexibility in reducing the cycle time of an implementation.
Booth radix 8 multiplier recoding with digits {−4,−3,−2,−1,0,1,2,3,4} reduces the number of partial products to
  ⌈            p      +      1        3    ⌉realizing a further 33% size reduction in the number of partial products. However, radix 8 recoded multipliers have several inherent disadvantages over radix 4 recoded multipliers. In particular:                (i) Multiplier precomputation: Radix 8 recoding requires the area and delay of a carry propagate adder to precompute a 3×-multiplicand partial product.        (ii) (ii) Multiplicand routing: Both the 1×-multiplicand and 3×-multiplicand are high precision operands that must be routed to substantially all partial product generators (PPGs).        (iii) (iii) Partial product generation: The partial product generator must select either the 1× or 3×multiplicand as well as shift and/or complement the term for each non-zero digit.        
Accordingly, a need has arisen for a multiplier method and design reducing the number of partial products to be accumulated below
  ⌈            p      +      1        2    ⌉while avoiding the precomputation adder area and delay of obtaining a 3× multiplicand. A further need has arisen to avoid the multiplicand routing and PPG selection complexity of sending two distinct multiples of the multiplicand to each PPG.