High speed digital multiplication circuits typically multiply an "n" bit multiplier with an "m" bit multiplicand by generating n partial product products. The partial products are reduced to a final product by adding the partial products at different stages of the circuits.
Because summation at each stage is done in parallel, the time required for the multiplication is the sum of the number of stages times the delay at each stage. Multiplication can be accelerated if the number of stages can be reduced without increasing the delay at each stage.
Traditionally, Wallace-tree full carry save adders (CSA) are used to produce the partial sum. As shown in FIG. 1, each full CSA 100 takes three bits of input (A, B, C) 101-103, and produce an S (sum) bit 104, and a C (carry) bit 105 bit as output. The S bit 104 is produced by an XOR gate 110, and the C bit 105 is produced by three AND gates 120 and an OR gate 130. The carry bit can be propagated to a next column.