In the electronic arts it is often desired to perform a Discrete Fourier Transformation DFT) of an input signal. Such DFT operation usually requires performing matrix multiplications of large matrixes. Based on the DFT properties, there are many Fast Fourier Transformation (FFT) algorithms, as described in "Digital signal processing" by John G. Proakis and Dimitris G. Manolakis, 3rd edition, Prentice-Hall 1996. These algorithms use symmetry properties of transformation coefficients depending specific numbers of data points to reduce the size of the matrixes to multiply at the expense of a larger number of matrix multiplications.
A general FFT, called Radix 2 FFT, is illustrated as follows. In the N-point DFT operation of an input signal {x(i),0.ltoreq.i.ltoreq.N} with an even number N of input data points, the DFT equation is: ##EQU1##
With the simple transformations ##EQU2##
this may be written as ##EQU3##
Now, X(k) is split into even- and odd-numbered samples X.sub.e (k) and X.sub.o (k) ##EQU4##
With the transformation coefficients ##EQU5## and the sums g.sub.e (i)=x(i)+x(i+N/2), g.sub.o (i)=x(i)-x(i+N/2) the N-point FFT of Eq. (1) can be expressed as: ##EQU6##
and further written as two N/2-point transformations ##EQU7## for the even and odd values of k.
Eq. (2) can be seen as the multiplication of input row vector x with matrix W to obtain row vector X. Similarly, Eqs. (3) and (4) can be seen as the multiplication of row vectors g.sub.e and g.sub.o with even and odd columns of coefficient matrix W to obtain row vectors X.sub.e and X.sub.o, respectively. Thus, the N-point transformation is split in two similar N/2-point transformations that are easier to compute. The input vector x or input data set has real or complex elements or data points and the transform vector X or output data set has complex elements or data points.
According to the state of the art, these steps are performed on a computer with a single central processing unit (CPU) which computes the sums of products one after the other. It fetches the data it needs for computing the actual product by direct memory access (DMA). Then it computes the actual product by normal floating point multiplication. However, a signal processor can be specialized to compute such vector products, using processor cache memory to store the coefficient matrix elements. Such a signal processor receives the input vector x and outputs elements of the transform vector X, but not necessarily in the right order. The correct order is established by storing the elements of the transform vector in a memory and reading them out in the right order.
Signal processors according to the prior art may use the above splitting scheme in that they receive the input vector x and store its elements in a memory, compute the vectors g.sub.e and g.sub.o, compute result vectors X.sub.e and X.sub.o. and output finally transform vector X. In these signal processors, the processing unit that performs the multiplications and summations, is also occupied by performing the additions in order to generate the vectors g.sub.e and g.sub.o. But, this results in undesirable long execution times.
There is a need to perform FFT more efficiently.