1. Field of the Invention
This invention relates generally to a pipeline multiplier. More particularly, this invention relates to a high speed pipeline multiplier which implements the partial carry save technique.
2. Description of the Prior Art
As the task of carrying out multiplication by the use of electronic circuits is a very time consuming complex operation, two of most common objects in the design of a multiplication circuit are to achieve high speed performance and to reduce circuit complexity whereby it occupies less integrated circuit (IC) chip areas. The demands to achieve these two design objects become even greater as the progress of the IC technology pushes the electronic circuits to become smaller in size while operating with ever higher speed. Since multiplication is one of the most basic operations most frequently performed in almost every data handling system such as the computers and digital signal processors, improvements made to the circuit design and algorithms used to perform multiplications are of significant importance to a wide varieties of applications where electronic processing techniques are employed.
Taking advantage of the very large scale integration (VLSI) techniques in making electronic devices on a single IC chip, highly systematic and modular configurations have been employed for the design of multiplication circuits with pipelined adder-rows wherein each row of adders are further divided into cells or blocks comprising many adders. The major consideration in carrying out either additions and multiplications by use of rows of pipelined adders is the techniques used in managing the carry from the least significant bits to the most significant bits.
Typically, there are three general categories of techniques in handling the carry propagations. The first technique is the `add-shift technique` by which the carry propagation is based on the use of a ripple through process. The carry is propagated by `rippling` from the less to the more significant bit. The circuit design for this type of multipliers is relative simple, however, it has the disadvantage that the computation is slow especially for multiplication of operands with larger number of bits. The second type is a `carry-save` multiplier wherein every carry for each of a series of full adders is saved and receive into a next full adder for a next more significant bit. A flip-flop circuit is required for each full adder which increases significantly the overhead areas occupied by the multiplier and resulting in higher power consumption by the circuits even this type of multipliers have faster computational speed. The third type of multiplier is the `carry-lookahead` multipliers wherein the carry-bit input of each adder is generated by taking into account the computation results of a plurality of preceding stages, e.g.., K stages. The carry-bit input to current stage, i.e., Ci of the i-th stage, is generated from the computational results of K-th preceding stages, i.e., (i+1)-th, (i+2)-th, ..., (i+K)-th stages, by considering two output signals, i.e., a carry generate signal g.sub.i+m and a carry propagate signal p.sub.i+m, of each stage. Mathematically, it can be represented by an equation as: ##EQU1##
Where F is a linear function of g.sub.i+m and p.sub.i+m. This type of adders and multipliers does have the advantage that the speed of computation may be increased by looking ahead and anticipating the carry and then a selection is made when sufficient information is ready. Theoretically, this type of technique would be useful for adders and multipliers to process longer strings of binary operands. However, due to the circuit complexity, a multiplier of this type would be too complex for actual implementation, particularly, for a multiplier to carry out multiplications using multiple stages of pipelined adders.
Letteney et al. disclose in U.S. Pat. No. 4,228,520 entitled `High Speed Multiplier Using Carry-Save/Propagate Pipeline with Sparse Carriers` (Issued on Oct. 14, 1980) discloses a multiplier with a configuration which enables the multiplication to be carried out by iteratively adding four multiples of a multiplicand in a stage of 4-2 carry save adders which then feed four-bit parallel adders each has four sum outputs and a carry output from the highest order bit position. Only the sum outputs are latched and then fed to a carry propagate adder on each iteration for addition to the previous partial products. Only the single carry output from each of the 4-bit parallel adders needs to be latched and then fed to another 4-bit parallel adder.
By the use of this multiplier configuration, Letteney et al. is able to reduce the latches and the input and output (I/O) pin requirements. This multiplier then generates only a single carry output from an 4-bit parallel adder. However, the multiplier as disclosed by Letteney et al. still has the problems that a single multiplication has to be partitioned into many iterations. And, for a multiplicand which consists of more than four multiples, each iteration processes only four multiples of a multiplicand. Even this multiplier discloses a configuration for reducing the number of latches for carry propagations, however, there are added circuit overhead in requiring one 4-bit parallel adder, one 4-bit carry propagation adder and one 4-bit register. Furthermore, the multiplier as disclosed by Letteney et al. may not suitable for higher speed operation due to the fact that it requires a carry propagation from a 4-2 carry save adder to the carry output of the 4-bit parallel adder which will cause each iteration to take up two clock cycles. This requirement will definitely reduce the usefulness of this multiplier to modern electronic systems and devices where high speed operation is required.
Cash et al. disclose in U.S. Pat. No. 4,887,233 entitled `Pipeline Arithmetic Adder and Multiplier` a pipelined multiplier design which uses a plurality pipelined adder rows. The multiplier comprises a pipelined para-multiplication subsection, a pipelined adder subsection and a synchronization register subsection. The multiplication is performed by employing a plurality of one-bit registered half adders to realize the addition operations for each row wherein the number of the half adder stages in the first row is equal to the number of the bits in the incoming binary words to be added. In each successive row, one least significant bit registered half adder is replaced with a one bit register until no adder are left. The number of rows is equal to the number of half adders in the first row. In a modified embodiment, the adders are used to receive two carry inputs and develop two carry outputs to reduce the number of the adder cells to approximately half.
Even though the multiplier as disclosed in this patent has the advantages of reducing the ripple-through carry delays, however, it is still limited by the size and cost of this multiplier due to use of carry-save technique in its entire multiplier circuit which requires large number of added registers. Additionally, the carry delay may further cause the concerns of clock letencies which may unduly add to the complexity in design and fabrication of this multiplier.
Nathan discloses in U.S. Pat. No. 4,644,488 entitled "Pipeline Active Filter Utilizing A Booth Type Multiplier" (Issued on Feb. 17, 1987) a pipeline active filter which employs multiplier units of the modified Booth decoder and carry-save adder combination. Each multiply unit uses a modified Booth decoder and only one row of carry save adders and the results are transferred to less significant multiplier positions for addition in subsequent operations for multiplication of bits by weight. Each carry save adder accepts a sum signal and a carry signal from more significant bit multiplier positions without having to add the carry signal it receives to obtain the correct sum. This particular configuration and procedures of operation may be suitable for the specific application to the filter which accepts weighting factors in a time sequential manner. However, the structure of the multiplier as disclosed in this patent presents a particular problem for its application as a regular carry save multiplier. Specifically, the number of circuits and the size of the multiplier increase almost exponentially in a non-linear fashion either under the condition that the number of bits of to be processed by the multiplier is increased or when a higher speed for multiplication is required. Besides the limitations caused by large number of circuits and size, the multiplier as disclosed by Nathan further has additional difficulties in testing as the number of circuits becomes very large.
Wong et al. disclose in another U.S. Pat. No. 4,953,119 entitled Multiplier circuit with Selectively Interconnected Pipelined multipliers for Selectively multiplication of Fixed and Floating Point Numbers; (issued on Aug. 28, 1990) a multiplier which performs fixed point and floating point multiplications of a plurality of input words with predetermined word length. The multiplier has an input logic means to receive a plurality of input data words having predetermined data word length for providing first and second output words comprising least and most significant portions of the input data words. A multiplier unit is coupled to the input logic means which has a first and a selected selectively interconnected parallel pipelined multiplier paths configured to implement a modified Booth algorithm. The two multiplier paths process the least significant and the most significant words of the first and the second data word respectively. Various recorders and pipelined registers are then used interposing these multiplier paths to obtain the final products from the multiplication apparatus. The invention disclosed by Wong et al. may be suitable for implementation where multiplications of fix point data as well as floating point data are required in a special purpose processor, however, the multiplier as disclosed which comprises interleaved and parallel architectures with coupled recorders and registers is very complex and may become too expensive and inconvenient for general purpose applications.
Therefore, there is still a demand in the art of design and manufacture of multiplication apparatus which provide flexibility in design and configuration such that the structure of the multipliers can be flexibly adjusted and optimized depending on the speed requirement and the IC design constraints such as the IC chip area available and the power consumption limitations. The structure and operation of this multiplier should be systematic and modular such that the difficulties as encountered in the prior art may be overcome.