This invention relates to a method and apparatus for digital multiplication, and in particular to a method and apparatus for implementing the carry-save adder in a binary multiplier.
The binary multiplier is a key element in digital computers that are used for computationally intensive calculations. The multiply function requires complex circuitry for fast implementation, and can therefore be a bottleneck to speed. Thus, performance improvements in the binary multiplier directly affect computer performance in computationally intensive applications. Typical binary multipliers incorporate the carry-save adder as a basic building block. Using Wallace-Tree binary adders (WTAs) is one form of implementing the carry-save adder, and is an integral element in the efficient implementation of high-speed binary multipliers. The Wallace-Tree adder performs the intermediate column addition calculation, taking the multiplier preliminary product results and generating the partial sum and partial carry associated with the columnar data. The WTA produces one pair of partial sum and carry; one WTA is required per input data column. Furthermore, in an M-bit by N-bit multiplier, N+M-1 such WTAs are required, with up to N bits of input per WTA. The Wallace-Tree adder employs the one-bit full adder (FA) as the basic building block. For the one-bit full adder, three input data bits yield two output data bits, the sum and carry.
The WTA comprises an array of FAs, configured in a series of stages. It reduces the column data from the initial size (N bits) down to the required pair of bits, the partial sum and partial carry. The FA bit reduction characteristics (i.e. three-to-two) determine the number of FA stages required in a WTA. And since the number of stages required in a given computation directly impacts the overall speed, the implementation of the WTA is a key to the throughput speed.
The three-to-two bit reduction characteristics of the FA implementation is such that the number of FA stages in the WTA is proportional to the log of the number of input bits. For specific examples: six bits requires three stages, thirty-two bits requires eight stages, and sixty-four bits requires ten stages. The number of gate delays per FA is implementation dependent. Nevertheless, as the number of bits gets large, the number of FA stages, and therefore the net delay through the multiplier, gets large. Thus, the number of input bits materially affects WTA speed and, as a consequence, processor speed. Therefore, any reduction in the number of FA stages required to implement a WTA would materially improve the throughput speed of a given binary multiplier.