Appendices A and B, which are part of the present disclosure, are included in a microfiche appendix consisting of three (3) sheets of microfiche having a total of one hundred eighty-nine (189) frames, and the microfiche appendix is incorporated herein by reference in its entirety. Appendices A and B are listings of computer programs including source code in the language VERILOG for a structural embodiment and a behavioral embodiment respectively of a multiplication accumulation circuit (also called xe2x80x9cMACxe2x80x9d) in accordance with the invention as described more completely below.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
This invention relates to a multiplication accumulation circuit (also referred to as xe2x80x9cMACxe2x80x9d) that can perform two multiplications and accumulations simultaneously, or alternatively a single multiplication and accumulation of double words.
Multiplication of two operands (typically called xe2x80x9cmultiplicandxe2x80x9d and xe2x80x9cmultiplierxe2x80x9d) to generate a product is well known. In a paper and pencil method taught in grammar school, the digits of the multiplier are taken one at a time from the right to the left, each digit is multiplied by the multiplicand, and the resulting product (also called xe2x80x9cintermediate productxe2x80x9d) is placed at an appropriate place, e.g. shifted left depending on the position of the multiplier""s digit being used. After all digits of the multiplier are multiplied, all the intermediate products are added to generate the product. The following example (in binary) illustrates the paper and pencil method:
Computers use another method, known as xe2x80x9cBooth""s algorithmxe2x80x9d that uses just addition, subtraction and shift operations based on examining a pair of adjacent bits in the multiplier, as illustrated by the following table:
wherein xcex1i refers to bit i in multiplier xcex1, b is the multiplicand, and 1xe2x89xa6ixe2x89xa64; xcex10 being assumed to be 0.
Booth""s algorithm has been used in a multiplier that xe2x80x9ccan perform one series of multiplication of (one word) X (one word) or can simultaneously execute two series of multiplications of (half word) X (half word) under the control of a division control signal . . . xe2x80x9d (col. 2, lines 47-52 in U.S. Pat. No. 4,825,401 granted to Ikumi). See also U.S. Pat. No. 5,586,070 granted to Purcell for another circuit xe2x80x9cwhich performs selectable multiplication operations on a first word having an upper byte and a lower byte and a second word having an upper byte and a lower bytexe2x80x9d (abstract).
A multiplication accumulation circuit (also called xe2x80x9cMACxe2x80x9d) in accordance with the invention has at least two modes and depending on the mode performs at least one of the following multiplication operations in a single cycle, (1) multiplication of two pairs of single words (in a xe2x80x9cdual modexe2x80x9d) or (2) multiplication of one pair of double words, (in a xe2x80x9cdouble modexe2x80x9d). The MAC normally operates in the double mode (also called xe2x80x9cdefault modexe2x80x9d) and goes into the dual mode when a control signal (also called xe2x80x9cdual mode signalxe2x80x9d is active. The dual mode signal, when active, enables a circuit (hereinafter xe2x80x9cshifting circuitxe2x80x9d) that is included in the MAC and that is used to shift bits of an intermediate product, as described below.
Moreover, in the same cycle the MAC also optionally adds to the resulting product (or products) another operand (e.g. value of a previous accumulation) if another control signal (also called xe2x80x9caccumulate signalxe2x80x9d) is active.
In one embodiment, the MAC has five input buses that carry signals for operands A, B, C, D and E, a control bus that carries signals for controlling the operations performed on the just-described operands, and an output bus that carries a signal generated by the MAC.
Operands A and B can be, respectively, the upper and lower halves of a first double word [A,B] to be used as a multiplicand. Similarly, operands C and D can be the upper and lower halves of a second double word [C,D] to be used as a multiplier. In this case, the four operands A, B, C and D are to be used as follows by the MAC: (1) to perform a single multiplication of the first double word with the second double word (in an operation called xe2x80x9cdouble multiplyxe2x80x9d), and (2) to perform an addition of the product of the double multiply operation, and the fifth operand E, e.g. to generate on the output bus a signal of value [A,B]*[C,D]+E (in an operation called xe2x80x9cdouble word MAC operationxe2x80x9d).
When the accumulate signal is inactive, the MAC does not add the fifth operand, thereby providing the result of the double word multiply operation. In the double word multiply operation, the two double words [A,B] and [C,D] can be written as 2n*A+B and 2n*C+D, where n is the number of bits in an operand, so that the product has the value 22n*A*C+2n*(A*D+B*C)+B*D.
When the dual mode signal goes active, the MAC (1) uses operands A and C to perform a first multiplication and (2) uses operand""s B and D to perform a second multiplication simultaneous with the first multiplication and also (3), performs an addition of the products of the two multiplications (in an operation called xe2x80x9cdual multiplyxe2x80x9d). Optionally (if the accumulate signal is active) the MAC adds to the two products the fifth operand E, e.g. to generate on the output bus a signal of value A*C+B*D+E (in an operation called xe2x80x9cdual MAC operationxe2x80x9d). If the accumulate signal is inactive, the MAC disregards the fifth operand E, and therefore generates on the output bus a signal of value A*C+B*D (in the operation called xe2x80x9cdual multiplyxe2x80x9d).
In one embodiment, the MAC has a hardware circuit (in the form of, for example, complimentary metal oxide semiconductor (CMOS) logic gates) that performs the double word MAC operation, and the MAC uses the same hardware circuit with the shifting circuit to perform the dual MAC operation. Prior to addition of two intermediate products in the MAC, the shifting circuit right shifts the bits in one of the two products so that the shifted bits are appropriately aligned with bits in the other of the two products. In an alternative embodiment, certain hardware in the MAC performs the dual MAC operation, and the MAC uses the same hardware with the shifting circuit to perform the double word MAC operation. In the alternative embodiment, the shifting circuit left shifts the bits in one of the two products prior to addition.
Use of a shifting circuit to implement a double word MAC operation (or alternatively a dual MAC operation) as described herein requires fewer number of gates than in the prior art, because one of two separate adders otherwise required in the prior art to add the two intermediate products in the two different ways is eliminated. Moreover, the MAC can perform two multiplications and two additions of single word operands in a single cycle, as compared to two or more cycles otherwise required in the prior art (e.g. for adding the two intermediate products). Therefore, use of a shifting circuit as described herein allows the MAC to perform the double word MAC and dual MAC operations faster while using minimal hardware, as compared to a prior art device.
In one embodiment, the MAC is implemented by two multiplier units, each of which uses operands A and B as a double word multiplicand, or alternatively uses a selected one of the two operands A, B as a single word multiplicand. Moreover, each of the two multiplier units uses a selected one of the two operands C and D as a single word multiplier. Each multiplier unit multiplies either the double word multiplicand or the selected single word multiplicand with the selected single word multiplier, and generates a signal for one of the two intermediate products that are summed by an adding circuit.
Depending on the implementation, the shifting circuit is coupled to a selected one of the multiplier units (the other of the multiplier units being referred to as the xe2x80x9cunselected multiplier unitxe2x80x9d). The adding circuit is coupled to receive signals from the shifting circuit, and the unselected multiplier unit. The adding circuit sums the received signals with the fifth operand E, and passes the resulting signal to the output bus.
In a first embodiment, the shifting circuit is coupled to a first multiplier unit (also called xe2x80x9clower multiplier unitxe2x80x9d) that uses operand D as the single word multiplier. Furthermore, depending on the operation being performed, the shifting circuit either generates a shifted version of the signal received from the first multiplier unit, or simply passes the signal from the first multiplier unit. For example, during a double word MAC operation, the shifting circuit simply passes the signal received from the first multiplier unit directly (i.e. without shifting) to the adding circuit.
In the dual mode, the MAC also performs any one of the following two accumulate operations (in addition to the dual MAC operation) in response to appropriate control signals on the control bus. In a first accumulate operation, also referred to as xe2x80x9cdual negative MAC operationxe2x80x9d, the MAC accumulates the difference of the two intermediate products and the fifth operand E, e.g. generates the value A*Cxe2x88x92B*D+E. When the accumulate signal is inactive, the MAC does not add the fifth operand, thereby providing the result of the xe2x80x9cdual negative multiplyxe2x80x9d operation. In a second accumulate operation, also referred to as xe2x80x9cdual cross MAC operationxe2x80x9d, the MAC accumulates the sum of two other intermediate products, e.g. generates the value A*D+B*C+E. In this case as well, the MAC does not add the fifth operand, thereby providing the result of the xe2x80x9cdual cross multiplyxe2x80x9d operation
The dual negative MAC operation and the dual cross MAC operation can be used successively to configure the MAC in the dual mode, to perform a complex operation X*Y+E in just two cycles, where each of X, Y and E is a complex number. Specifically, Er and Ei represent the real and imaginary portions of operand E, A and B represent the real and imaginary portions of operand X and C and D represent the real and imaginary portions of operand Y so that the final output is (A+jB)*(C+jD)+(Er+jEi) that can be rewritten as (ACxe2x88x92BD)+j(AD+BC)+(Er+jEi), and further rewritten as (ACxe2x88x92BD+Er)+j(AD+BC+Ei). In one cycle, the MAC computes the output signal""s real portion, e.g. of value A*Cxe2x88x92B*D+Er. In another cycle (either succeeding or preceding), the MAC computes the output signal""s imaginary portion, e.g. of value A*D+B*C+Ei. If necessary, the real portion and the imaginary portion can be generated simultaneously in a single cycle by use of two MACs of the type described herein.
In addition to the above-described two modes (double mode and dual mode), the MAC of this embodiment also has a single mode wherein the MAC disregards two of the operands, e.g. operands A and C. In the single mode, the MAC can perform the following two accumulate operations, or the following two multiplications if the accumulate signal is inactive. Therefore, in a first accumulate operation, also referred to as xe2x80x9csingle MAC operationxe2x80x9d, the MAC accumulates only one product, e.g. generates a signal of the value B*D+E. The MAC does not perform the addition when the accumulate signal is inactive, thereby to provide the result of a single multiply operation. In a second accumulate operation, also referred to as xe2x80x9csingle negative MAC operationxe2x80x9d, the MAC accumulates a negated product, e.g. generates a signal of the value xe2x88x92B*D+E. Again, the MAC does not perform the addition when the accumulate signal is inactive, thereby to provide the result of a single negative multiply operation.
Therefore, using the same hardware a total of six accumulate operations, or if the accumulate signal is inactive six multiply operations can be performed by the MAC of this embodiment. The twelve operations are each performed in a single cycle. The single cycle performance of any one of twelve different operations by the same hardware provides an order of magnitude greater flexibility in computations to be performed by an integrated circuit (IC) chip having the above-described MAC, as compared to prior art IC chips. Moreover, as noted above, an IC chip having the above-described MAC performs each of the six accumulate operations with less hardware and faster than IC chips of the prior art.