The Discrete Fourier Transform (DFT) is a linear transformation that maps a sequence of N input numbers X1 to XN (input operands) into a corresponding set of N transformed numbers (output operands). A Fast Fourier Transform (FFT) is a processing scheme for carrying out a DFT numerically in an efficient manner. The Cooley-Tukey algorithm is probably the most widely-used FFT algorithm. It transforms the input operands in a sequence of several stages. Each stage is a linear transformation between a set of input operands and a corresponding set of output operands. The output operands of a given stage may be used as the input operands of the next stage, until the final output operands, i.e., the DFT of the initial input operands, are obtained. Each of these linear transformations may be represented by a sparse matrix and therefore can be carried out rapidly. The DFT can thus be represented as a product of sparse matrices.
Each stage of the FFT may involve the evaluation of so-called butterflies. A radix P butterfly is a linear transformation between P input operands and P output operands. In each stage, the N input operands may be partitioned into N/P sets of input operands. Each of these sets may be transformed individually, i.e., not dependent on the other sets of input operands, by means of the radix P butterfly. While the butterfly may be the same for each subset of input operands and for each stage, the partitioning of the set of N input operands into the N/P subsets is generally different for each stage.
FIG. 1 schematically illustrates an example of a second or following stage of an FFT of order N=32, i.e., a FFT on a set of 32 input operands. In the FIG. 1, the output operands of a first stage (not shown) are multiplied by a twiddle factor and then input for the second stage. Please note that FIG. 1 was cut off at operand X23, the other operands X24-X31 are not shown.
The two columns with the heading W32n in FIG. 1 indicate with what factor the operands are multiplied before they are processed in a following stage. For example, a value of 0 in these columns means a multiplication with a factor W320=1. As mentioned above, the set of input operands for a particular stage may be partitioned into N/P subsets, and a radix P butterfly may be applied to each of the subsets. In the example of FIG. 1, P equals 4 in the column “Radix 4 stage”. In the column “Radix 4 stage” only two butterflies are schematically represented, see butterfly labelled ‘1’ and butterfly labelled ‘5’.
Each line in FIG. 1 is representing 1 input/output operand. Each operand may be complex valued. The values of the operands are not shown in the FIG. 1. The values of the operands may, of course, differ from one stage of the FFT to the other. The output operands of the RADIX4 stage illustrated in FIG. 1 may be input operands for a following RADIX4 stage or for a final RADIX2 stage of the FFT.
Each input operand may be stored at an addressable memory cell. Similarly, each output operand of the stage may be stored at an addressable memory cell. A memory cell or a buffer cell may also be referred to as a memory location or a buffer location, respectively. Conveniently, the input operands X0-X31 may be stored at input memory cells labelled 0 to 31 in the present example. Similarly, the output operands Y0 to Y31 may be written to output memory cells labelled 0 to 31. In other words, the I-th input operand (I=0 to 31) may be provided at the I-th input memory cell. The I-th output operand (I=0 to 31) would be written to the I-th output memory cell.
The partitioning of the set of input operands into subsets corresponding to butterflies may, in general, be different for different stages of the FFT. The butterflies of a given stage may be executed independently from one another, sequentially, or in parallel. In the example of FIG. 1, N/4=32/4=8 butterflies in the RADIX4 stage(s) will be executed sequentially. Two butterflies may be executed in parallel.
In the RADIX2 stage of FIG. 1 only three butterflies are shown for simplicity. The output operands of the previous stage are multiplied by a twiddle factor W320 and input for the butterflies at the RADIX2 stage.
In today applications, the input operands may be stored conveniently in a memory unit (e.g. SRAM) in accordance with their numbering. In other words, the input operands 0 to N−1 may be conveniently stored in a memory unit at memory locations with addresses ordered in the same manner as the input operands. For instance, input operand 0 may be stored at address 0. Input operand 1 may be stored at address 1, and so on. However, due to the spacing between the input operands, the input operands may have to be read individually from non-contiguous memory locations before the respective butterfly can be applied on them. The input operands required for a certain butterfly, e.g., the input operands 0, 4, 8, and 12 for the first butterfly in the left part of FIG. 1, can, in this case, not be read as a block from the memory unit. Thus if the operands would be just read and processed in the linear way they are stored in a memory unit, there would be a negative impact on the throughput of an FFT processor.