This derivation of the general Cooley-Tukey algorithm similar to that of The Fast Fourier Transform, Prentice-Hall, 1974 [E. Oran Brigham], pp 188-190. It provides the foundation for this discussion. The simple equation for the Discrete Fourier Transform (DFT) is as follows: ##EQU1## N is the number of points to be discretely transformed. Assume N=R.sub.0 R.sub.1 R.sub.2 . . . R.sub.m-1 where R.sub.0, R.sub.1, R.sub.2, . . . , R.sub.m-1 are integers, not necessarily different. The indices n and k can then be expressed in a variable radix representation: ##EQU2## Eq. (1.1) can now be rewritten as: ##EQU3## where ##EQU4## indicates a summation over k.sub.i =0,1,2, . . . , R.sub.m-i-1 -1 with 0.ltoreq.i.ltoreq.m-1.
Note that: EQU W.sup.nk =W.sup.n[k.sbsp.m-1.sup.(R.sbsp.1.sup.R.sbsp.2 . . . .sup.R.sbsp.m-1.sup.)+. . . +k.sbsp.0.sup.] (1.4)
and the first term of the summation expands to: EQU W.sup.nk.sbsp.m-1.sup.(R.sbsp.1.sup.R.sbsp.2.sup.. . . R.sbsp.m-1.sup.) =W.sup.[n.sbsp.m-1.sup.(R.sbsp.0.sup.R.sbsp.1.sup.. . . R.sbsp.m-2.sup.)+. . . +n.sbsp.0.PI.k.sbsp.m-1.sup.(R.sbsp.1.sup.R.sbsp.2 .sup.. . . R.sbsp.m-1.sup.)] EQU =[W.sup.R.sbsp.0.sup.R.sbsp.1 .sup.. . . R.sbsp.m'1 ].sup.[n.sbsp.m-1.sup.(R.sbsp.1.sup.R.sbsp.2 .sup.. . . R.sbsp.m-2.sup.)+. . . +n.sbsp.1.sup.]k.sbsp.m-1 W.sup.n.sbsp.0.sup.k.sbsp.m-1.sup.(R.sbsp.1.sup.R.sbsp.2 .sup.. . . R.sbsp.m-1.sup.) (1.5)
Because W.sup.R.sbsp.0.sup.R.sbsp.1 .sup.. . . R.sbsp.m-1 =W.sup.N =1, Eq. (1.5) can be written as: EQU W.sup.nk.sbsp.m-1.sup.(R.sbsp.1.sup.R.sbsp.2 .sup.. . . R.sbsp.m-1.sup.) =W.sup.n.sbsp.0.sup.k.sbsp.m-1.sup.(R.sbsp.1.sup.R.sbsp.2 .sup.. . . R.sbsp.m-1.sup.) (1.6)
Eq. (1.4) becomes: EQU W.sup.nk =W.sup.n.sbsp.0.sup.k.sbsp.m-1.sup.(R.sbsp.1.sup.R.sbsp.2 .sup.. . . R.sbsp.m-1.sup.) W.sup.n[k.sbsp.m-2.sup.(R.sbsp.2 .sup.. . . R.sbsp.m-1.sup.)+. . . +k.sbsp.0.sup.] (1.7)
Eq. (1.3) can now be written as: ##EQU5## Note that the inner sum is over k.sub.m-1 and is only a function of the variables n.sub.0 and k.sub.m-2, . . . , k.sub.0. Thus a new array can be defined as: ##EQU6## Eq. (1.8) can now be written as: ##EQU7## By arguments analogous to those leading to Eq. (1.6), we obtain: EQU W.sup.nk.sbsp.m-2.sup.(R.sbsp.2.sup.R.sbsp.3 .sup.. . . R.sbsp.m-1.sup.) =W.sup.(n.sbsp.1.sup.R.sbsp.0.sup.+n.sbsp.0.sup.)k.sbsp.m-2.sup.(R.sbsp.2. sup.R.sbsp.3 .sup.. . . R.sbsp.m-1.sup.) (1.11)
The identity of Eq. (1.11) allows the inner sum of Eq. (1.10) to be written as: ##EQU8## Eq. (1.10) can be rewritten in the form: ##EQU9## When Eq. (1.13) is repeatedly reduced in this manner, a set of recursive equations is obtained of the form: ##EQU10## Eq. (1.14) is valid provided (R.sub.i . . . R.sub.m-1)=1 for i&gt;m-1 and k.sub.-1 =0. The final results are: EQU X(n.sub.m-1, . . . , n.sub.0)=x.sub.m (n.sub.0, . . . , n.sub.m-1). (1.15)
Note that Eq. (1.15) involves digit-reversing to yield a meaningful index.
The DFT is now termed the Fast Fourier Transform (FFT), because of the reduced calculation complexity inherent in this recursive approach. The x.sub.i 's can be considered as the outputs of each stage of the FFT, with the x.sub.i-1 's being the stage inputs.
Consider that the FFT is just a black box, with the output being some function of the input, and with some timing delay, .DELTA. time units, from input to output. This black box must process an array of inputs, and provide an array of outputs. Each output array then immediately becomes the next input array for the black box. The arrays are input one element at a time, and are output one element at a time. After some number of passes, the box is told to stop, and to provide its final output array. The black box is required to be busy at all times, so that its processing power is fully exploited. This black box must accept input data `simultaneously` with providing output data. That is, for every time unit, the box must both accept an input element and provide an output element, with each output element being at least partly based on an input element which was input A time units ago. What is needed is a method and corresponding hardware for accomplishing this, yet requiring only N words of memory for the box's use, where N is the number of elements of each array. What is also needed is a method and corresponding hardware for accomplishing this with minimum memory so that the box and its memory can be combined in a single device, even for larger FFT's, to minimize outside interfaces, and to allow the highest possible speed.
Many FFT `black boxes` have been built in the past, yet all have skirted this problem either by providing 2XN words of memory, or by letting the black box run at 50% or less efficiency.
What is needed is a method and corresponding hardware that allows reading from AND writing to the SAME address during the aforementioned time unit. That is, the memory access involves a single read/write access during each time unit. This is more efficient than a read access followed by a write access, as only one address is required, as no extra time is necessary to allow a 2nd address to stabilize. What is further needed is a method and corresponding hardware for doing this, with just N words of memory and a single address generator, while allowing for maximum box efficiency. What is additionally needed is a capability for a programmable size FFT coupled with the minimum amount of memory for carrying out the FFT.
Note: The term `butterfly`, as used herein, sometimes means `FFT engine`, and sometimes means `base DFT element`. The meaning is apparent when the term is taken in context.