1. Field of the Invention
The present invention generally relates to the field of Fast Fourier Transform (FFT) computation, in particular, to a memory mapping scheme for FFT processing using butterfly operations.
2. Description of the Related Art
The Fast Fourier Transform (FFT) is considered one of the most important algorithms in signal processing and allows for efficient conversions of separate functions from the time domain to the frequency domain and vice versa. This is essential for a wide range of applications.
The calculation of FFT processes is computationally intensive and can require a substantial amount of memory space as well as memory communication bandwidth. The basic operation in the FFT is known as the butterfly operation. Computing the butterfly operation requires a total of five memory accesses, i. e. two data loads, one twiddle load and two data stores. The operations to be performed include one in multiplication and two additions performed on complex numbers.
The fact that processor speeds have increased at a much faster rate than memory speeds has rendered the memory accesses the bottleneck of those kinds of FFT operations. Hence, it is a challenge to transfer the required data to the processing unit in time in order to avoid stalls. In current state-of-the-art solutions this is handled by means of prefetching, prediction and caching.
Document U.S. Pat. No. 8,364,736 B2 discloses a method for calculating an FFT computation, wherein the FFT computation is decomposed into partial FFT computations of smaller size and then transforms the original index from one dimension into a multi-dimensional vector. By controlling the index factor, the input data can be distributed to different memory banks such that the multi-bank memory for high-radix structures can be supported simultaneously without memory conflicts.
Document U.S. Pat. No. 7,395,293 B1 discloses a method for performing an FFT computation of N input data elements using a radix-K decomposition of the FFT. N/K input data elements are written into respective ones of K addressable memory locations and N/K×logKN passes are performed on the input data. Each pass includes reading K data elements in parallel from the K addressable memory locations using the generated addresses, wherein the K data elements are in a first order corresponding to the respective memories. The first order of K data elements is permuted into a second order of K data elements and a radix-K calculation on the second order of K data elements is performed. This results in corresponding result data elements in the second order. The second order of K result data elements is permuted into the first order and the K result data elements are written in parallel into the corresponding K addressable memory locations using the respective addresses.
Document US 2005/025 6917 A1 describes a method for performing a FFT computation. The method facilitates the identification of computationally efficient patterns for sequentially generating a unique set of bit-reversed address pairs.
Document WO 2005/086020 A2 discloses an FFT circuit for performing an FFT computation, wherein the FFT circuit is implemented using a radix-4 butterfly element and a partitioned memory for storing a prescribed number of data values. The radix-4 butterfly element is configured to perform an FFT operation in a prescribed number of stages, each stage including a prescribed number of in-place computation operations relative to the prescribed number of data values. The partitioned memory has memory portions for storing parts of the data values, so that each in-place computation operation is based on the retrieval of an equal number of data values retrieved from each of the memory portions.
Document U.S. Pat. No. 7,996,453 discloses a method for performing an FFT computation wherein FFT butterfly data sets are stored in memory in a predetermined order. Such an order can allow a butterfly data set to be read from a single memory address location. The memory addressed is computed by an address rotary function depending on the butterfly and stage of the FFT. Addressing the memory in such a manner allows each butterfly data set of a subsequent FFT stage to be stored to a single memory location. Shuffle registers are provided to delay the writing of FFT butterfly results to the memory until most of the data corresponding to a particular butterfly operation has been computed. The shuffle registers are configured to rearrange and to combine the results of one or more butterfly operations in a different manner from which they have been computed. Combining the results in this manner allows a subsequent FFT stage to access data by addressing a single memory location.
Document Zhang Q., Han J., Han C., “A novel address mapping scheduling strategy for continuous flow parallel FFT implementation”, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications & Conference on Real-Time Computing Systems and Applications, June 2006, Volume 2, PDPTA 2006, discloses a continuous flow parallel FFT processor that uses an address mapping scheduling strategy. Four parallel butterfly computation units are provided to enhance the throughput. The address mapping scheduling strategy uses only two memory units for a continuous flow parallel FFT implementation, thereby reducing the utilization of memory resources. The non-conflict address mapping approach ensures the parallel computation of four butterfly units in one clock cycle and the execution of the address mapping scheduling strategy.
It is an object of the present invention to provide another method for performing an FFT computation, in particular, a method which allows for an efficient use of memory resources and contributes to an increased FFT computation speed.