This disclosure relates to systems and methods for reducing memory usage and increasing the throughput in variable-size Fast Fourier Transform (FFT) architectures.
3GPP Long Term Evolution (LTE) is a wireless communication standard that supports high-speed wireless communications. LTE is a communication standard that is based on both Single-Carrier Frequency Division Multiplexing (SC-FDM) and Orthogonal Frequency Division Multiplexing (OFDM) algorithms that make heavy utilization of FFTs like the variable-size Discrete Fourier Transform (DFT) or the Inverse Discrete Fourier Transform (IDFT).
An FFT calculation includes reading an input data sequence with data samples x[n], n=0, . . . , N−1, where N is the length of the input data sequence, and outputting the frequency domain FFT data sequence with data samples X[k], k=0, . . . , N−1. Such a calculation is conventionally called an N-point FFT. FFT algorithms use a divide and conquer approach to reduce the computational complexity of calculating an FFT. For example, the Cooley-Tukey algorithm recursively decomposes the problem of calculating the FFT into two sub-problems of half the size (i.e., N/2) at every intermediate pass. The size of the FFT decomposition is known as the radix. In the above example, the radix is 2. This decomposition approach generally works for any radix k provided that N is a power of k. Thus, calculating an FFT typically involves making a number of passes (also referred to as stages) over the input data sequence and intermediate results. In general, each pass can be associated with a different radix.
The LTE standard commonly uses FFT algorithms with radix R=2, 3, 4, or 5. As an example, consider the calculation of a 64-point FFT using the radix R=4. For computing the FFT, an FFT processor conventionally processes the input data sequence in the order where the indices corresponding to the data samples are arranged in the following order:
00, 16, 32, 48, 01, 17, 33, 49, 02, 18, 34, 50, 03, 19, 35, 51, 04, 20, 36, 52, . . . , 15, 31, 47, 63.
This order of data samples is referred to as a radix-reversed order. In the first pass of the FFT calculation, data samples corresponding to indices 00, 16, 32, and 48 are used to compute a first radix-4 butterfly; data samples corresponding to indices 01, 17, 33, and 49 are used to compute the next radix-4 butterfly; and so on. An FFT butterfly is a portion of the FFT calculation that breaks up the larger FFT calculation into smaller sub-transform calculations.
It has been noted that variable-size DFT/IDFT implementations often require high memory usage and suffer from low throughput. Accordingly, it is beneficial to have architectures capable of performing DFT/IDFT computations efficiently. If efficient architectures for performing such computations are not available, DFT/IDFT computations may become a bottleneck preventing LTE based communication schemes from operating optimally.
Some current variable-size DFT/IDFT implementations utilize dual processing cores in order to meet LTE throughput requirements. Some current variable-size DFT/IDFT implementations additionally utilize double dual port memory to meet the memory usage requirements. For example, some variable-size DFT/IDFT implementations utilize a ping-pong buffer which includes two separate storage arrays arranged in a configuration that allows reading and writing of data to occur in parallel. In particular, each storage array may have an independent data bus that may, in a first time period, enable data to be written to the first storage array while data is being read from the second storage array. In a second time period, data may be read from the first storage array while data is written to the second storage array. In subsequent time periods, the reading and writing of data to each storage array may alternate in the manner described above.
However, utilizing a ping-pong buffer comes with the cost of increased memory requirements, i.e., more memory is required than if a single storage array was utilized. Therefore, it would be desirable to have methods and systems for performing variable-size FFTs efficiently without increasing memory requirements.