1. Field of the Invention
The present invention relates to the field of digital signal processing. More particularly the invention relates to an improved FFT/IFFT processor.
2. Background of the Invention
The class of Fourier transforms that refer to signals that are discrete and periodic in nature are known as Discrete Fourier Transforms (DFT). The discrete Fourier transform (DFT) plays a key role in digital signal processing in areas such as spectral analysis, frequency domain filtering and poly-phase transformations.
The DFT of a sequence of length N can be decomposed into successively smaller DFTs. The manner in which this principle is implemented falls into two classes. The first class is called a “decimation in time” approach and the second is called a “decimation in frequency” method. The first derives its name from the fact that in the process of arranging the computation into smaller transformations the sequence “x(n)” (the index ‘n’ is often associated with time) is decomposed into successively smaller subsequences. In the second general class the sequence of DFT coefficients “x(k)” is decomposed into smaller subsequences (k denoting frequency). The present concept of the invention applies to both “decimation in time” as well as “decimation in frequency”.
Since the amount of storing and processing of data in numerical computation algorithms is proportional to the number of arithmetic operations, it is generally accepted that a meaningful measure of complexity, or of the time required to implement a computational algorithm, is the number of multiplications and additions required. The direct computation of the DFT requires “4N2” real multiplications and “N(4N−2)” real additions. Since the number of computations and thus the computation time is approximately proportional to “N2” it is evident that the number of arithmetic operations required to compute the DFT by the direct method becomes very large for large values of “N”. For this reason, computational procedures that reduce the number of multiplications and additions are of considerable interest. The Fast Fourier Transform (FFT) is an efficient algorithm for computing the DFT.
The basic computational block is called a “butterfly” a name derived from the appearance of flow of the computations involved in it. FIG. 1 shows a typical radix-2 butterfly computation. 1.1 represents the 2 inputs (referred to as the “odd” and “even” inputs) of the butterfly and 1.2 refers to the 2 outputs. One of the inputs (in this case the odd input) is multiplied by a complex quantity called the twiddle factor (WNk). The general equations describing the relationship between inputs and outputs are as follows:X[k]=x[n]+x[n+N/2]WNk X[k+N/2]=x[n]−x[n+N/2]WNk 
An FFT butterfly calculation is implemented by a z-point data operation wherein “z” is referred to as the “radix”. An “N” point FFT employs “N/z” butterfly units per stage (block) for “logz N” stages. The result of one butterfly stage is applied as an input to one or more subsequent butterfly stages.
The conventional method of implementing an FFT or Inverse Fourier Transform (IFFT) uses a radix-2/radix-4/mixed-radix approach with either “decimation in time (DIT)” or a “decimation in frequency (DIF)” approach.
Computational complexity for an N-point FFT calculation using the radix-2 approach=O(N/2 * log2N) where “N” is the length of the transform. There are exactly “N/2 * log2N” butterfly computations, each including 3 complex loads, 1 complex multiply, 2 complex adds and 2 complex stores. A full radix-4 implementation on the other hand requires several complex load/store operations.
With the advancement of VLSI technology, it has become possible to incorporate several execution units like ALUs (Arithmetic and Logic unit) and multipliers in the processor cores, thereby permitting computational throughput to be increased. All these advancements may be utilized to enhance the performance of FFT/IFFT in terms of total time required to complete a FFT/IFFT of a given size. If we look at the basic butterfly structure of FIG. 1, it is evident that the computations, i.e. the multiplications, additions/subtractions are dependent on the loading of inputs and loading of the twiddle factor in the sense that computations cannot start unless these operands are loaded from the memory. The computations can finish fast because of the availability of multiple execution units which may function in parallel but there are requirements for faster loading and storing of operands and results. In many processors, multiple load/store units achieve this. Another solution to this problem is loading/storing operands/results for multiple consecutive butterflies and using multiple execution units to compute multiple butterflies almost simultaneously. This approach requires only augmentation of the data bus width. This is much more economical in terms of silicon area and complexity as compared to multiple load/store units. The necessary requirement for this is that the inputs/outputs of the consecutive butterflies be stored in consecutive locations in the memory. If we see the butterfly structure from top to bottom of any stage (except the first stage) for FFT/IFFT, it is clear that the consecutive butterflies cannot be computed for operands in consecutive memory locations.
FIG. 2 shows a block diagram for simultaneous loading of two n-bit operands. In this mechanism it is required to have two separate load/store units in the central processing unit (CPU) each having n-bit wide data bus connected to the memory block separately. This mechanism for simultaneous loading of two n-bit operands requires multiple load store units and hence is an expensive mechanism.
U.S. Pat. No. 5,293,330 describes a pipelined processor for mixed size FFT. These and many more works have dealt with enhancement of the performance of FFT/IFFT. The performance can further be improved with the implementation of the present invention.
Our co-pending application U.S. patent application Ser. No. 10/781,336, filed on Feb. 17, 2004, describes an algorithm, which is suitable for use with the proposed architecture for loading/storing inputs/outputs of multiple consecutive butterflies with only one load/store instruction.