Discrete time convolution is one of the most common applications for a traditional digital signal processor. On a programmable digital signal processor, the convolution sum Y(n) is efficiently handled by a single repeat instruction, followed by a multiply-accumulate instruction (MAC), nested within a block repeat process. This requires on the order of N×N multiply-accumulate operations to form a complete discrete convolution sum Y(n). In a real-time digital signal processor application where the convolution sum is performed often, this calculation will be a large portion of the entire system cycle count. Any reduction in the convolution sum calculation can have a large impact on system performance.
Current algorithms for the convolution sum computation focus on minimal instruction count and fast single “repeat multiply-accumulate” operations. Overhead is kept to a minimum through the use of circular buffering and auto increment of data pointers in the multiply-accumulate instruction. The circular buffer is one which will be automatically reset to the ‘beginning address’ when the ‘last address’ is incremented.
No concern has typically been given for whether the multiply-accumulate operation is being performed on overlapping or non-overlapping terms. For some very specific function that uses the convolution sum, such as a finite impulse response (FIR) function, there may even exist a special instruction that combines unique properties of that function for faster execution.