The behavior of electromagnetic signals can be analyzed in the time domain (e.g., how the signal amplitude varies over time) as well as the frequency domain (i.e., the different frequency components that make up the signal). The Fourier transform mathematically relates these two domains and, because of its ubiquity across signal-processing applications, efforts have been made to accelerate its execution computationally—hence the many FFT approaches. In addition, a signal can be analyzed as a continuous waveform or, in digital signal processing (DSP) applications, as a large set of time-domain points. For DSP applications, an FFT “butterfly” algorithm may used to compute a discrete Fourier transform (“DFT”) of a signal represented in digital form. The algorithm divides the points into subsets (in a process known as “decimation”), computes the DFT of each subset, and then processes the results of the DFT of each subset to produce a final result consisting of a set of frequency-domain points. The subsets may be so small that they each contain only a few (or even just one) time-domain points, making the DFT of each subset trivial—the DFT of a single point is simply the point itself. For example, an initial set of 1024 time-domain points may be decimated into 1024 subsets of one point each; the subsets are then carefully processed, combined and merged into a 1024-point frequency-domain result.
Most of the computational effort of the algorithm lies in the processing of the subsets. The processing occurs in a series of stages in which the subsets are first processed into intermediate results, the intermediate results are further processed, and so on, until the final set of frequency-domain points is produced. Each stage includes a plurality of parallel operations that each process n input points simultaneously to produce n output points—the value n is known as the “radix” of the FFT algorithm. Because a dataflow diagram of a radix-2 (i.e., a radix with a value of two) operation resembles a butterfly (as shown in FIG. 1A, in which points x0, x1 are processed into points y0, y1 in accordance with the equations y0=x0+x1 and y1=x0−x1), these operations are known as butterfly operations or butterflies. Operations having other radices are also known as butterfly operations (such as the radix-4 operation shown in FIG. 1B).
Many different variations of the above-described basic algorithm exist. For example, a decimate-in-time FFT separates the original time-domain points (and further sub-divisions) into odd and even groups, while a decimate-in-frequency FFT separates the original time-domain points (and further subdivisions) into first and second halves. An in-place FFT performs the transformation using only the memory space required to hold the original samples (at the expense of more complicated routing and control logic), while a constant-geometry FFT (also known as a constant-topology FFT) requires only simple routing and control logic (at the expense of requiring additional memory space).
For example, an in-place implementation that can perform an FFT on a 1024-point time-domain input requires a memory only large enough to hold the 1024 points but requires the points to be read in different patterns for each stage of the re-combination of the points. Assuming that the in-place implementation is radix-4 and decimate-in-time (though a similar analysis applies to any in-place implementation), five stages are required to re-combine the 1024 points (because log4(1024)=5). In the first stage, the first of 256 radix-4 butterflies receives points 0, 1, 2, 3; the second radix-4 butterfly receives points 4, 5, 6, 7; and so on. The results of each butterfly operation are written back to the same memory space that held the original 1024 points. In the second stage, the first butterfly receives points 0, 4, 8, 12 and the second butterfly receives points 1, 5, 9, 13; in the third stage, the first butterfly receives points 0, 16, 32, 48 and the second butterfly receives points 1, 17, 33, 49. In general, the input for the first radix-4 butterfly for stage three and beyond is given as{0},{4(i-1)},{2×4(i-1)},{3×4(i-1)}  (1)and, for the second butterfly, is{0+1},{4(i-1)+1},{2×4(i-1)+1},{3×4(i-1)+1}  (2)and so on, wherein i is the stage number.
Two different approaches may be used to implement the different reading patterns required for the in-place FFT, each having its own drawbacks. In a first approach, four different memory banks may be used, each capable of storing 1024 points, so that four points may be read from the four memory banks every cycle, in accordance with Equation (1). Obviously, this design quadruples the amount of memory required. Another conventional approach uses a single data bank but requires complicated hardware (e.g., logic and buffers) to support the different reading patterns. A constant-geometry approach similarly requires additional memory. A need therefore exists for fast, efficient FFT techniques and processors that require only a simple routing scheme and that do not use additional memory space.