In the electronic arts it is often desired to perform a Discrete Fourier Transformation (DFT) of an input signal. Such DFT operation usually requires performing matrix multiplications of large matrixes. Based on the DFT properties, there are many Fast Fourier Transformation (FFT) algorithms, as described in "Digital signal processing" by John G. Proakis and Dimitris G. Manolakis, 3rd edition, Prentice-Hall 1996. These algorithms use symmetry properties of transformation coefficients to reduce the size of the matrices to multiply at the expense of a larger number of matrix multiplications.
The general DFT is illustrated as follows. In the N-point DFT operation of an input signal {x(i), 0.ltoreq.i.ltoreq.N} with a number N of input data points, the DFT equation is: ##EQU1##
If the N can be factored as a product of two integers, that is: N=LM, then the above DFT equation can be rewritten as: ##EQU2## with the elements ##EQU3## But with W.sub.N.sup.(Mp+q)(mL+l) =W.sub.N.sup.MLmP W.sub.N.sup.mLq W.sub.N.sup.Mpl W.sub.N.sup.lp it becomes, W.sub.N.sup.MLmP =1, W.sub.N.sup.mLq =W.sub.M.sup.mq, W.sub.N.sup.Mpl =W.sub.L.sup.pl, so that the DFT can be simplified as: ##EQU4##
The above equation can be executed as three steps:
1. First step, compute the (L.times.M)(M.times.M) matrix operation: ##EQU5## 2. The second step, compute the scale vector: EQU G(l,q)=W.sub.N.sup.lq F(l,q),0.ltoreq.q.ltoreq.M-1,0.ltoreq.l.ltoreq.L-1.
3. The third step, compute the (M.times.L)(L.times.L) matrix operation: ##EQU6## PA1 (a) reorder the input signal elements x(i) in a matrix x(l,m),0.ltoreq.l.ltoreq.L-1, 0.ltoreq.m.ltoreq.M&lt;1, according to the factors L and M, i.e. store the signal column-wise, PA1 (b) perform the M-point DFT of each row, i.e. multiply L rows of the matrix x(l,m) by the appropriate transformation matrix W: ##EQU7## (c) scalar multiply the resulting array by phase factors W.sub.N.sup.lq : EQU G(l,q)=W.sub.N.sup.lq F(l,q),0.ltoreq.q.ltoreq.M-1,0.ltoreq.l.ltoreq.L-1, PA1 (d) perform the L-point DFT of each column, i.e. multiply M columns of the matrix G(l,q) by the appropriate transformation matrix W: ##EQU8## (e) reorder the resulting array X(p,q), i.e. read the resulting array row-wise.
According to the state of the art, these steps are performed on a computer with a single central processing unit (CPU) which computes the sums of products one after the other. It fetches the data it needs for computing the actual product by direct memory access (DMA). Then it computes the actual product by normal floating point multiplication.
If in this general DFT the ordering of data is taken into account, the following scheme, called "Flying DFT", is applied:
In all these transformations the matrices W.sub.C contain the elements ##EQU9## Because steps (b) and (d) contain complete DFTs, both steps may be replaced recursively by the whole scheme.
FIG. 1 shows this general DFT scheme according to the prior art in a schematic diagram for a 15-point DFT. The input signal {x(i), 0.ltoreq.i.ltoreq.15} has a number 15 of input data points, where 15 is a product of 5 and 3. In step (a) the input signal elements x(i) are stored column-wise in a 5*3 matrix x(l,m),0.ltoreq.l.ltoreq.5-1, 0.ltoreq.m.ltoreq.3-1. In step (b) are performed 3-point DFTs of each of the 5 rows. The resulting array is scalar multiplied in step (c) by phase factors W.sub.15.sup.lq. In step (d) 5-point DFTs of each column are performed and finally in step (e) the resulting array X(p,q) is read out row-wise as the output signal {X(k), 0.ltoreq.k.ltoreq.15}.
Consequently in the prior art, efficient FFT operation is based on a decimation to as small matrices as possible. This leads to butterfly computation where a signal consists of 2.sup.N data points and computation is dissolved into n=log.sub.2 N stages of decimation, each stage involves N/2 basic multiplications of a two-element vector with a 2.times.2 matrix, the so-called butterflies. This reduces the number of floating-point multiplications largely in order to make most efficient use of single processor computers. The FFT operation requires a butterfly data path, i.e. between two decimation stages the elements of the output vectors of the last stage are regrouped to form the input vectors of the next stage. This is inconvenient to implement in a signal processor with a parallel processor architecture.
There is a need to perform fast DFT on a signal processor.
There is a need to perform fast DFT in cases where the number of data points of the input signal is not a power of two.
There is a need to perform fast DFT on a signal processor with a parallel processor architecture.