Technical Field
The embodiments herein generally relate to a fourier analysis, more particularly to a system and method for optimizing mixed radix fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT).
Description of the Related Art
Discrete Fourier Transform (DFT) is one of the most used transforms for analysis/synthesis of discrete time domain signals. Fourier analysis converts a signal from time domain to a representation in the frequency domain and vice versa. Consider discrete complex numbers x0, x1, x2, . . . x(N−1). The DFT of the discrete complex numbers is defined by formula:
                              x          ⁡                      [            k            ]                          =                              ∑                          n              ⁢                                                          ⁢              0                                      N              -              1                                ⁢                                    x              ⁡                              [                n                ]                                      ·                          e                                                                                                                  -                        j                                            ⁢                                                                                          ⁢                      2                      ⁢                      π                      ⁢                                                                                          ⁢                      nk                                                                                                            N                                                                                                          Eq        .                                  ⁢        1            Where k=0, 1, . . . , (N—1) is the frequency index, n=0, 1, . . . , (N−1) is the time index,
  e                                          -            j                    ⁢                                          ⁢          2          ⁢          π          ⁢                                          ⁢          nk                                    N            is the twiddle factor co-efficient. Computation of DFT using Eq. 1 requires O(N2) operations. Inverse Discrete Fourier Transform (IDFT) uses the same formula as Eq. 1, with the exception of sign reversal for the twiddle factor co-efficient. IDFT computation can be done by using DFT equation, by using swapping real and imaginary parts at the input and then swapping them again after DFT operation. This property holds good independent of the method used for implementation of DFT. Since IDFT has the same computation structure as DFT, all optimization proposed for DFT computation are directly applicable for IDFT and hence in further discussion only DFT is mentioned with the assumption that all optimizations applicable for DFT can be used for IDFT also.
A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence. FFT rapidly computes such transformations by factorizing the DFT matrix into a product of sparse (mostly zero) factors. As a result, it manages to reduce the complexity of computing the DFT from O(N2), which arises if one simply applies the definition of DFT, to O(N log N), where N is the data size, by elimination of trivial twiddle factor multiplications. FFT algorithm thus computes the DFT of a sequence, at a lower computational complexity compared to Eq. 1. Due to such complexity reduction in computation, it has made real-time signal processing possible in embedded systems in the domain of digital communications, image processing, video, audio and the like.
For a sequence of size N, FFT is calculated by factorizing N=N1*N2*N3 . . . Nn, where ‘n’ is the number of FFT stages, (N1, N2, N3, . . . , Nn) are radix of the stages of FFT. Based on the values of N1, N2, . . . Nn used for factorizing, different types of FFT may be employed. The types of FFTs may include for example, (i) same radix FFT, (ii) split-radix FFT, (iii) mixed radix FFT, and the like. The same radix FFT involves factorization of N-sized FFT performed using only one value of radix. For example:—For N1=N2= . . . =Nn=2, N=2n, it results in radix-2 FFT, which is known to one skilled in the art. Similarly, radix-4/8 FFT is oblivious to one skilled in the art. The advantage of radix-2/4/8 FFTs is that most of the twiddle factors are unity, and hence the number of twiddle factor multiplication operations is reduced. The split-radix FFT involves factorization of N-sized FFT performed using a mixture of radix-2/4/8. The split-radix FFT is performed to reduce the number of stages for large FFT sizes, compared to using only radix-2 for factorization. For Example: N=256=2*2*2*4*8 may be decomposed into N1=N2=N3=2, N4=4, N5=8, N=23*4181. Mixed-radix FFT involves factorization of N-sized FFT performed using power of 2 (such as radix-2/4/8) and non-power of 2 (such as radix-3/5/7).
The factorization using arbitrary radix leads to huge flexibility in choice of FFT size, but non-power of 2 FFT computations require more operations than power of 2 FFT computations. For Example: N=180=3*3*4*5, can be decomposed into N1=3, N2=3, N3=4, N4=5, in case of mixed-radix FFT. If only power of 2 radix is used, then minimum 256-point had to be taken since next power of 2 is 256 for FFT size of N=180. The mixed-radix FFTs have gained popularity in communication systems, video processing domains where FFT size is not always of power of 2. The mixed-radix FFT decomposition allows more granularity in FFT sizes compared to using only power of 2 sizes, thereby allowing FFT sizes to be taken for only that number of samples, rather than zero-padding and subsequently taking higher size power of 2 FFT. However, support for any arbitrary radix increases computational complexity of FFT. For example, in long term evolution (LTE) standards, allocation of resources for user is done at a granularity of 12 sub-carriers or multiples of 12, thus requiring support for 12 and multiples of 12-point FFT, thus making mixed-radix FFT necessary. For LTE, the mixed-radix FFT allows fine grain control of resource allocation depending on bandwidth demand per user.
However, realization of the mixed-radix FFT solution compared to realization of power of 2 FFT is more challenging. The non-power of 2 radix sizes (such as 3/5/7) has a butterfly structure with internal multipliers, compared to radix-2, which has no internal multipliers in butterfly structure. Typically, computational unit will need to support multiple radices; which have different internal structures and consume different number of samples depending on radix. More importantly, keeping computation unit always busy with data to process becomes a bigger challenge since data access pattern from memory is different for every radix. The data access pattern is dependent on present radix stage and also ordering of radix i.e. radix configuration. The twiddle factors access pattern also keeps changing every stage, depending on current radix and radix configuration. Data organization to support efficient access for a range of FFT sizes and different combination of radix sizes makes it an important problem to solve to achieve efficient realization.
FIG. 1 illustrates a typical line diagram of a mixed-radix FFT solution according to a prior art. The system includes a control module 102, a data memory 104, and a computation unit 106. The data memory 104 holds the input data, intermediate output and final output. The computational unit 106 performs twiddle factor multiplication and radix dependent butterfly operation. The control module 102 generates schedule for reading/writing input data from data memory 104, the twiddle factor generator and the computation operation. The control module 102 also holds a radix configuration, such as which radix occurs in first stage, what is the total number of stages, size of FFT, and the like.
In the mixed-radix FFT, the computation unit 106 supports different radices. For example, in radix-5, 5 inputs are taken in and 5 outputs are given out. To support radix-5, 5 inputs have to be read and 5 outputs written, totally 10 memory access. It also needs 4 twiddle factors, leading to 4 twiddle factor complex values generated. The data may be provided in parallel, thus it puts a lot of pressure on memory to provide multiple inputs in parallel and write multiple outputs in parallel. Also, data has to be arranged in memory so that data is easily available for next stage butterfly. Data address access pattern changes at every stage for input and depends on radix configuration.
One of the presently known solutions to achieve parallel memory output is to partition memory into multiple banks, involving division of a single memory is into multiple smaller banks, where total memory size equals maximum FFT size supported. However, this results in area increase due to additional overhead of bank selection logic.
The addressing scheme for reading/writing data from multi-bank memory in case of mixed-radix FFT becomes complex, where it requires modulo operations for address computation. Furthermore, even if the banking is optimized for single size of FFT for data access and radix configuration, it may not work for different size. For supporting range of different radix (such as 2/3/4/5/7) of FFT, number of banks must be equal to maximum radix supported.
The computation unit 106 has to support multiple radix twiddle factor multiplication and butterfly operations. To provide high-throughput, average cycles taken for computation for each radix should be approximately equal, while keeping resource usage down. In several known techniques, the multiple memory banks may be used for parallel access to data, which will supply to multiple computation modules simultaneously. The addressing scheme for data ordering and access in multiple banks have been explored for meeting throughput requirements of applications like 3GPP LTE, which uses mixed-radix FFT for SC-FDMA transceiver chain. However, presently known techniques do not provide a single solution that address all the concerns in achieving high-throughput mixed-radix FFT including optimizing memory address access, computational unit optimization, and data ordering in memory.