1. Technical Field
This invention relates to digital signal processors, and has particular relation to multiply-accumulate (MAC) units.
2. Background Art
Digital Signal Processors (DSPs) are specialized types of microprocessors that are specifically tailored to execute mathematical computations very rapidly. DSPs can be found in a variety of applications including compact disk players, PC disk drives, telecommunication modem banks, and cellular telephones.
In the cellular telephone context, the demand for DSP computation capability continues to grow, driven by the increasing needs of applications such as GPS position location, voice recognition, low-bit rate speech and audio coding, image and video processing, and 3G cellular modem processing. To meet these processing demands, there is a need for improved digital signal processor architectures that can process computations more efficiently.
Considerable work has been done in these areas. Applicant Sih is also an applicant in the following applications for U.S. patents:
xe2x80x9cMultiple Bus Architecture in a Digital Signal Processorxe2x80x9d, Ser. No. 09/044,087, filed Mar. 18, 1998, now abandoned;
xe2x80x9cDigital Signal Processor Having Multiple Access Registerxe2x80x9d, Ser. No. 09/044,088, filed Mar. 18, 1998, now U.S. Pat. No. 6,496,920;
xe2x80x9cMemory Efficient Instruction Storagexe2x80x9d, Ser. No. 09/044,089, filed Mar. 18, 1998, now abandoned;
xe2x80x9cHighly Parallel Variable Length Instructions for Controlling a Digital Signal Processorxe2x80x9d, Ser. No. 09/044,104, filed Mar. 18, 1998, now abandoned;
xe2x80x9cVariable Length Instruction Decoderxe2x80x9d, Ser. No. 09/044,086, filed Mar. 18, 1998, now U.S. Pat. No. 6,425,070; and
xe2x80x9cDigital Signal Processor with Shiftable Multiply Accumulate Unitxe2x80x9d, Ser. No. 09/044,108, filed Mar. 18, 1998, now abandoned.
The disclosure of these applications is incorporated herein by reference.
In many signal processing algorithms, the computation (B*C)+/xe2x88x92(D*E) is prominent, where B, C, D, and E are 16-bit integers. This computation is invoked when performing single-pole IIR filtering, computing magnitude of a complex quantity, dot-product or cross product of 2 vectors, and interpolation. It is also used in extended-precision operations (e.g. a 32xc3x9732 multiply). Since this operation is so ubiquitous, it is desirable to have a digital signal processor complete this operation in one cycle.
Although DSPs with two multiply-accumulate (MAC) units are available (e.g. Lucent DSP16000, TI C6x), they cannot compute the desired quantity in one cycle because their MAC units are separate. If we let R1, R2, R3, and R4 be general purpose 16-bit registers containing B, C, D, and E respectively, and let L1, L2, and L3 be 40-bit result registers, then a single invocation of the computation
(B*C)+(D*E)
could be written in pseudocode on these existing processors as:
L1=R1*R2, L2=R3*R4; L3=L1+L2;
It should be noted that this computation takes 2 cycles on these processors.
FIG. 1 is a block diagram of a conventional MAC unit (100). A register file (102) has an input port PI1, and has three output ports, PO1, PO2, and PO3. The register file is connected to a memory (104). The output ports PO2 and PO3 are applied to a multiplier (106), which multiplies the signals together and applies them to one input of an adder (108). The adder receives its other input from PO1 of the register file. The sum is fed back to PI1 of the register file.
In the first clock cycle, nothing is applied to PO1, and R1 and R2 are applied to ports PO2 and PO3, respectively. The product, L1, is fed back to the register file and placed in a temporary register attached to PO1. In the second clock cycle, R3 and R4 are applied to ports PO2 and PO3, respectively, and emerge from the multiplier as L2. The adder combines L2 from the multiplier with L1 from PO1, produces L3, and feeds it back to the register bank via PI1. Once L3 is in the register bank, it can be made available to the memory.
40-bit adder, and a 17xc3x9717 bit multiplier, are shown. This is conventional, but any convenient number of bits may be used
Two multiply-accumulate units are coupled together so that the computation (B*C)+(D*E) can be completed in one cycle. An adder adds together the products of the two multipliers. The sum is applied to the first accumulator. Preferably, the second product is also applied to the second accumulator, and a multiplexer applies either a zero or the second product to the adder. If two unrelated computations are to be executed simultaneously, then the zero is applied, and the output of the second accumulator is fed back to the register file. If a single (B*C)+(D*E) computation is to be executed, then the second product is applied to the adder, and the output of the second accumulator is disregarded.