The Multiplier-And-Accumulator is the core processing unit in digital signal processors. In the application of programmable digital signal processors, such as in video, audio, voice, and telecommunication, we often use a finite impulse response filter, an infinite impulse response filter, a match filter, correlation coefficient operation, convolutional operation, transformation between time field and frequency field, etc. Therefore, it becomes a significant part of digital signal processors in order to perform high-dimensional vector product accumulation at high speed.
There are three methods of accelerating a Multiply-And-Accumulate operation. The first is to optimize Multiply-And-Accumulate arithmetic; this method reduces the delay time and speeds up operation with different Booth Multiplier architecture. The second way involves the auxiliary function of digital signal processors. In the program sequence control unit, multiplication-and-accumulation are often executed with a looping counter, in order to avoid overhead looping operations needed for detecting data ending conditions, so that the digital signal processor can perform the multiplication-addition at full speed. Besides, because the two vectors to be multiplied and accumulated are often different from each other in length, such as in a finite impulse response filter, match filter, and so on, the coefficient vector will be read in a cyclic way. Thus, digital signal processors usually provide cyclic addressing to accelerate the accessing of the cyclic data. Both of the above techniques are traditional ones for accelerating multiplication-addition, maximizing the Multiplier-And-Accumulator efficiency through elimination or reduction of the extra operations in hardware or software.
The third method is to execute the MAC operations in the parallel Multiplier-And-Accumulator configuration. The MAC operations are accelerated by means of parallel-operating Multiplier-And-Accumulators, using Single Instruction Multiple Data (SIMD) as its processor architecture. However, it has a higher hardware cost, and in operations of different precision, the time required for the operation is the same, so that the hardware is optimally efficient. Therefore, the so-called subword parallel digital signal processor is derived. Because different applications require different signal precision, a high-precision operation can be segmented into several low-precision operations, and thus parallel operations can be performed. Usually, most of these kinds of design are for simple addition, subtraction, and logic operations. In recent years, the subword parallel configuration has been adopted in the Multiplier-And-Accumulator to accelerate multiply-and-accumulation. This design can increase operation speed, but data accuracy is lowered. Several low-precision data are read at one time, and thus additional hardware or software is required for data alignment. Options to solve this problem are to add groups of alternate buffer storage, or to add a fault bit indicator for alignment, and then to upload it into the buffer storage for operation. In this case, each group of inputs needs extra data alignment processing.
In summary, Multiplier-And-Accumulator configurations with Subword Parallel operation can effectively step up data signal processing efficiency in multiplication-addition, but the data alignment requires extra processing for the different precision data.