High speed digital signal processing (DSP) in communication system applications require low power implementations of high performance DSP capabilities, especially for hand held devices. These DSPs must be capable of efficiently performing the spectral analysis and digital filtering required for hand-held spread spectrum communication equipment, high speed modems and all digital radios. These applications require matrix math operations, complex arithmetic operations, Fast Fourier Transform (FFT) calculations, encoding/decoding, searching and sorting. Optimization requires that the DSP arithmetic engine use many resources efficiently. For example, for execution of the complex multiply operation: EQU (x1+iy1)(x2+iy2)=(x1x2-y1y2)+i(x1y2+y1x2)
four real multiplications and two real additions are required to perform this operation in one instruction. More multipliers and adders would not improve the speed, less would require more instructions.
What is needed is a high performance arithmetic engine that can be optimized to perform higher radix FFTs, complex multiplication and high speed data sorting. Higher radix FFTs require less memory accesses because intermediate results are stored in registers in the arithmetic engine. The throughput is therefore faster because the arithmetic engine can be pipelined. A radix 8 4096 point FFT requires 2/3 less data fetches than a radix 2 FFT and 1/3 less than a radix 4. The complex multiply is the basis for most DSP algorithms and is therefore an important performance target. Sorting is required to do statistical filtering and interpret results. Sorting can determine the location of peaks, depressions and statistically variant data which are typical signal analysis objectives. Such an arithmetic engine requires many resources which must be used efficiently because of size and power limitations.
Typical arithmetic engines for general purpose DSPs include one multiplier and one adder referred to as a multiplier-accumulator (MAC). Arithmetic engines of DSPs that are more application specific can contain an array of several multipliers and adders. The arithmetic engines of the latter DSPs are typically optimized to perform a Radix-4 butterfly and cannot do high speed sorting.
Typical arithmetic engines are also not capable of efficiently handling the arithmetic computations required for all digital communications applications. General purpose DSPs are capable of performing Radix 2 butterflies efficiently but can not do higher radix butterflies because they do not have sufficient resources. The FFT requires many complex multiplications, and each complex multiply requires four real multiplications and two real additions. Since general purpose DSPs have only one MAC, it takes four passes through the arithmetic engine of such a DSP to perform just one complex multiply. Engines which have more than one MAC usually limit the resources to Radix 4 butterfly operations.
Typical arithmetic engines are also not capable of handling high speed data sorting. A typical data sort requires comparison of two pieces of data which results in a condition code. The condition code is passed to the execution unit which then causes the program to branch in one of two paths on a subsequent instruction. The next instruction after the branch moves the data to its new location based on the results of the data comparison. This method requires a minimum of three instructions and can only operate on one data pair at a time. Typical arithmetic engines do not have sufficient data path switching to perform hardware sorts on multiple data pairs.