A typical binary multiplier for multiplying two binary numbers together comprises a series of processing stages, such as an operand encoder, a partial product generator, a product term compressor, and a final term addition stage.
The operand encoder encodes the first operand and reduces the number of terms representing the operand. Thus, for example, a 32-bit number may be reduced using a Booth code to 17 terms or fewer.
The partial product generator multiplies the second operand by each of the encoded terms to produce a partial product term. Thus, for a 32-bit multiplier where the first operand is encoded as 17 terms, a total of 17 partial product terms are produced.
The product term compressor adds together (or as otherwise known compresses) the many partial products to form a pair of terms.
Finally, the final term addition stage adds the pair of terms together to form the final product value.
FIG. 1 shows a typical 32-bit multiplier structure whereby first and second operands 151 and 153 are input into a “Booth recoding” stage 101, which carries out the operand encoding stage and the partial product generation stage to generate 17 partial product terms 157.
The 17 partial product terms 157 are fed into the compression circuitry, shown as “17 terms to 2” compressor 103 to output two 64-bit terms 159.
The 64-bit output terms 159 are passed to the final term addition stage, the “add64” block 105, to produce a final product value 161.
Compression circuitry, such as the “17 term to 2” compressor 103 shown in FIG. 1, has typically been designed to reduce all the possible partial product terms generated by the operand to form 2 terms within the smallest number of consecutive stages. Conventionally, the compression circuitry is arranged in terms of columns of compression stages. Each compression stage column operates by combining term bits having the same binary weighting (i.e. 2n), each column compressing up to 17 terms.
The typical compression column is designed in such a way that it is input insensitive and therefore capable of handling any of the 17 terms in any particular order. However, this type of design is problematic in that an element of redundancy has to be built into the compression column to allow for every possibility.
For example, if any retiming of the product terms is carried out to allow pipelining of the compression elements, the memory elements for storing the terms for pipeline retiming all of the terms (such as flip-flops) have to be capable of handling the full partial product width of 64 bits. Such a design is wasteful in terms of circuitry.
Furthermore, this input insensitive design where the compression stages are designed to handle all of the input terms at a specific stage is wasteful in terms of the number of compression elements required within the stage. The conventional compression column introduces compression cells where not all of the inputs are attached to terms.
Furthermore, not only are the conventional designs wasteful in terms of circuitry, but they also force the user to implement over-cautious input value timing constraints requiring the previous partial product generation stages to generate all of the partial products substantially at the same time.