An FFT calculation includes reading an input data sequence in the form of time-domain samples x[n], n=0, . . . , N−1, where N is the length of the input data sequence, and outputting an FFT data sequence in the form of frequency-domain components X[k], k=0, . . . , N−1. Such a calculation is conventionally called an N-point FFT. Some FFT algorithms use a divide and conquer approach to reduce the computational complexity of calculating an FFT. For example, some FFT algorithms recursively decompose the problem of calculating the FFT into two sub-problems of half the size (i.e., N/2) at every intermediate pass. The size of the FFT decomposition is known as the radix. In the above example, the radix is 2. This decomposition approach generally works for any radix k provided that N is a power of k. Thus, calculating an FFT typically involves making a number of passes. These passes may be made over the input data sequence x[n], n=0, . . . , N−1 (and intermediate results) in the time domain, in which case the algorithm is a decimation-in-time (DIT) algorithm. Alternatively, the passes may be made over the FFT data sequence X[k], k=0, . . . , N−1 (and intermediate results) in the frequency domain, in which case the algorithm is a decimation-in-frequency (DIF) algorithm. In general, each pass can be associated with the same or a different radix. An algorithm using different radix values in different passes is a mixed radix algorithm and may be useful for computing an FFT with irregular sizes (e.g., size that are not a power of 4). For example, a mixed radix 4/2 FFT algorithm may use a radix R=4 in a first stage and R=2 in a second stage.
As an example, consider the calculation of a 64-point FFT using the radix R=4. For computing the FFT, an FFT processor conventionally processes the input data sequence in the order where the indices corresponding to the data samples are arranged in the following order:
00,16,32,48,01,17,33,49,02,18,34,50,03,19,35,51,04,20,36,52, . . . , 15,31,47,63. This order of data samples is referred to as a radix-reversed order. In the first pass of the FFT calculation, data samples corresponding to indices 00, 16, 32, and 48 are used to compute a first radix-4 bin; data samples corresponding to indices 01, 17, 33, and 49 are used to compute the next radix-4 bin; and so on. An FFT bin corresponds to a portion of the FFT calculation that breaks up the larger FFT calculation into smaller sub-transform calculations.
Many applications use Decimation-in-Frequency (DIF)—these applications perform an FFT algorithm on time-domain input data, process the FFT data samples, then perform an inverse FFT (IFFT) algorithm to recover time-domain output data. Because it is generally expensive and inefficient to perform FFT and IFFT algorithms on data ordered in natural order, existing applications either reorder the data in radix-reverse order or use decimation-in-time (DIT) algorithms. These solutions suffer from low throughput and high usage of logic and memory resources. These solutions are also incapable of computing certain radix values, such as radix-2 FFT or mixed radix 4/2 FFT.
A known FFT implementation cascades together a series of FFT stages as shown in FIG. 2. The illustrated architecture 200 is a forward radix-4 FFT architecture, and includes four stages 201, 202, 203, and 204 connected in series. Input data is fed into the first stage 201, and the output of each stage 201, 202, and 203 is directly input to subsequent stage 202, 203, and 204, respectively. Processed data is output in radix-reversed order from the last stage 204. Each stage (except the last stage) has its own dedicated twiddle stage generator—i.e., each stage 201, 202, and 203 has its own dedicated twiddle stage generator, 232, 234, and 236, respectively. In addition, each twiddle stage generator is associated with a corresponding twiddle stride for that pass, i.e., the twiddle stride increases by 4 at each stage, so the number of generated twiddle factors also decreases by a factor of 4 at each stage. In the exemplary architecture of FIG. 2, twiddle stage generator 232 generates 192 twiddle factors with a twiddle indexing of 1× for the first stage 201; twiddle stage generator 234 generates 48 twiddle factors with a twiddle indexing of 4× for the second stage 202; and twiddle stage generator 236 generates 12 twiddle factors with a twiddle indexing of 16× for the third stage 203.
The serial architecture of FIG. 2 has many limitations. First, the serial architecture of FIG. 2 is unidirectional, and can only perform FFT operations on data input from left to right. Second, the serial architecture of FIG. 2 can implement FFT operations in only one direction—it can either perform forward (FFT) operations or reverse (IFFT) operations, but not both. Third, each stage in the serial architecture of FIG. 2 is limited to the configuration of its associated twiddle stage generator, and as such can function only in one specific FFT pass in only one direction of the implemented FFT operation. As a result, a DIF application may require having more than one instance of architecture 200 to handle both the FFT and IFFT operations involved. This has several disadvantages, such as lower performance, increased usage, and lower throughput.