Signal processing systems are typically required to convert signals between time and frequency domains. The Fast Fourier Transform (FFT) algorithm enables such signal conversion between time and frequency domains. Compared with other transform algorithms, FFT has advantages of uniform structure and less computation, and thus has been widely used in signal processing systems.
FFT takes N points of data as input and outputs N points of data. In general, a transform from time domain (TD) to frequency domain (FD) is called forward transform, while a transform from frequency to time domain is called inverse transform. The order in which TD data are arranged according to sampling time is called “natural order,” that is, the sampling time of the data points is incremental. Also, the order in which FD data are arranged according to their frequencies is called “natural order,” that is, the frequencies corresponding to the data points are incremental. As opposite to the “natural order,” “bit reversal” refers to mirror-reversing a binary representation of an index of each data point in the natural order. The resultant index is an index of the data point in a “bit-reversed order.” Assuming that an index is represented with 3 bits. The index 3 in the natural order is (011)2 in the binary form, and it will become (110)2 in the binary form, i.e., 6 in the decimal form, after being mirror-reversed. Accordingly, “6” will be an index for the data point in the bit-reversed order.
Many implementations have been proposed for the FFT algorithm, and are generally classified into to two types of TD decimation and FD decimation. FIG. 1 shows an algorithm using the FD decimation. The original data 100 are sorted in the natural order, and should go through a bit reversal operation 103 to be in the bit-reversed order before the data are used for computation. Output data 102 obtained from the computation on the reversed data 101 are arranged in the natural order. In the TD decimation-type algorithm, the input data are in the natural order, while the output data are in the bit-reversed order. Therefore, the output data should be transformed to be in the natural order before being used for any further processing.
Some patent documents have proposed methods for bit reversal of data. An example is US application publication 2003/0028571A1 (“Real-time method for bit-reversal of large size arrays”) which provided a two-step reversal method. The first step includes implementing data bit-reversal of large size arrays between an external storage and an on-chip memory using DMA. The second step includes implementing internal data bit-reversal of small size arrays using a processor. There are problems with the method when implementing internal data bit-reversal of small size arrays using a processor. First, the internal sorting process needs a number of iterations, and specifically, log2N−1 iterations are required for data of a length N. Second, data read/write operations in each iteration can be conducted in a scalar fashion, and thus parallel sorting of multiple data cannot be achieved. As a result, the bit-reversal method described in the document is less efficient.
The U.S. Pat. No. 7,640,284B1 (“Bit Reversal Methods for a Parallel Processor”) discusses that the company, nVidia, utilizes multiple processing cores or SIMD execution components in a graphic processing unit (GPU) to implement parallel bit reversal. Although the disclosure achieves parallel bit reversal, there are still some problems. First, each processing core or execution component needs to conduct lookup table access and shift operation for many times when calculating bit reversal addresses. Thus, the address calculation alone requires multiple clock cycles, and the execution efficiency is lowered. Second, as described on page 21 line 10 of the patent document, the method can reduce, but not eliminate, memory access conflicts among the multiple processing cores or SIMD execution components. Such memory access conflicts further reduce efficiency of the bit reversal operation.