Fast parallel multipliers are important for high speed and low power signal processing systems and much effort has been devoted to their construction. Many of today's processors, (e.g., central processing units (CPUs), graphics processing units (GPUs), and the like), include an execution (EX) unit that implements Booth's multiplication algorithm to multiply two signed binary numbers (i.e., a multiplicand and a multiplier) in two's complement notation.
A partial product (PP) is a product formed by multiplying the multiplicand by one digit of a multiplier when the multiplier has more than one digit. PPs are used as intermediate steps in calculating larger products. For example, the product of 67 and 12 may be calculated as the sum of two PPs, 134 (67×2)+670 (67×10), or 804. A usual way of multiplying a 64 bit number by a 64 bit number is to generate 33 PPs using Booth encoding and adding them together to get the final results.
In a high-performance processor, the on-chip power densities play a dominant role in both static and dynamic conditions due to shrinking device features. The consumed power is usually dissipated heat, affecting the performance and reliability of the chip. A complex multiplier is an arithmetic circuit that is extensively used by a processor. For large bit-width multiplications, (e.g., a 64-bit multiplier and a 64-bit multiplicand), a parallel multiplier circuit including a large number of compressors may be used to compress PP stages. Higher order compressors may be configured to permit the reduction of the vertical critical paths in the parallel multiplier circuit, resulting in a product that is generated in a faster and power-efficient manner.