The invention pertains to the field of bit serial multipliers, and, more particularly, to the field of bit serial multipliers of improved throughput. Such improved multipliers utilize parallel load shift registers to store each of the partial product bits and carry bits from a first multiplication to free the multiplier to begin a new multiplication while the carry bits and partial product bits from the previous multiplication are being combined to arrive at the final result.
Bit serial multipliers are known. Such multipliers load a multiplicand in parallel into a plurality of adder cells with one bit in each cell. A multiplier is then shifted into the machine one bit at a time. Each multiplier bit is used to multiply the multiplicand to generate a partial product. Each partial product is shifted to the proper position to be added with the next partial product generated from the next multiplier bit. Each such addition can generate carries at one or more bit positions. These carries must be propagated to the proper bit position, i.e., the next bit position to the left, for use in the partial product addition operation.
Carry propagation takes valuable time so a variation called a stored carry machine was developed. In a stored carry machine, the fact that the partial product bits are shifted right by one place for each new multiplier bit in preparation for the next partial products addition is used to advantage. In such machines, instead of shifting the carry bit from each stage to the left for use in the partial product addition being performed by the cell on the left, the carry bit is stored for one multiplier bit cycle time in the cell in which the carry was generated. When the partial product bit arrives from the cell on the left for the next partial product addition, the carry bit is input to the full adder of the same cell in which the carry was generated for use in the addition. This storage of the carry for one cycle time while the partial product is shifted right by one bit position has the effect of a left shift by one bit position of the carries generated in each cell.
Another variation to operations by bit serial multipliers is called the Booth algorithm. This algorithm is well known and involves looking at both the current multiplier bit and the previous bit and then performing one of four operations depending upon the logic state of the current and previous multiplier bits. These operations are given in Table I below:
TABLE I ______________________________________ Yn Yn-1 Function ______________________________________ 0 0 No arithmetic operation. Shift partial product relative to multiplier. 0 1 Add multiplicand to partial product, S, and shift new partial product one place to the right relative to the multiplier. 1 0 Subtract multiplicand from partial product, S, and shift new partial product. 1 1 No arithmetic operation (perform correction by executing both add and subtract. Shift partial product relative to multiplier. ______________________________________
As used in Table I and elsewhere herein Yn is the current multiplier bit, and Yn-1 is the previous multiplier bit.
The Am25LS14 Serial/Parallel Multiplier manufactured by Advanced Micro Devices of Sunnyvale, Calif. is one example of such a bit serial multiplier with stored carry and using Booth's algorithm. This device is described in great detail in an application note called "Mechanization of the Am25LS14 Serial/Parallel Multiplier" by John R. Mick which is hereby incorporated by reference.
One problem with the genre of bit serial multipliers represented by the AMD AM25LS14 is that for an N bit multiplier, it takes 2N clock cycles to get all the bits of the final result shifted out of the machine. Each multiplier bit is shifted in during the first N clock cycles, and all the partial products and all the carries at each bit position are generated. The least significant N bits of the final product are shifted out during the first N clock cycles. However, the most significant bits of the final product are still in the machine stored as the partial product bits and the carries in the individual cells of the multiplier. The next N clock cycles must be consumed shifting out the partial product bits and the carries and combining them to derive the most significant final product bits. During these second N clock cycles, the multiplier cells are essentially idle being occupied only with the process of shifting and combining the partial product and carry bits. This is an inefficient use of the multiplier. Accordingly, a need has arisen for a way to increase the throughput of bit serial multipliers.