A discrete Fourier transform (DFT) is an algorithm that performs a Fourier transform on discrete time samples of a time varying function. The fast Fourier transform (FFT) is a generic name given to a family of highly specialized computation algorithms which greatly reduce the time required to compute a DFT.
The DFT algorithm is accomplished generally by multiplying each discrete input sample from a time varying function by a number of filter coefficients. By using a FFT algorithm, certain mathematical symmetries in the DFT algorithm are taken advantage of to maximize the efficiency of the calculation. As a result, an FFT computation can be broken down into a number of successively smaller FFTs. For example, a 64-point FFT can be implemented with a first FFT pass consisting of four 16 point FFTs followed by a second FFT pass of sixteen 4-point FFTs. Each FFT pass in turn represents a number of complex computations generally referred to as "butterflies" or "butterfly computations".
Since the FFT is merely a sequence of complex multiplications and complex additions, it can be accomplished with many different algorithms. Accordingly, the FFT has been implemented with a variety of hardware and software mechanisms. Software intensive implementations have only one computational element to which all input samples and intermediate sums must be sequentially routed. Hardware intensive implementations usually have at least one computational element dedicated to each FFT pass to allow for some level of parallel computation. Some implementations may even have a computational element for each butterfly computation that is required within an individual FFT pass. The resulting implementation depends primarily on the environment in which the FFT will operate. Military applications generally require a compact high speed FFT which operates with low power and low latency.
Satisfaction of these requirements has classically been attempted with several techniques. One method of calculating FFTs at high speed rates is to build the FFT in a pipeline architecture. A pipeline architecture provides one computational element for each pass of the FFT. This will increase the speed of the computation by allowing a small degree of parallel processing. Parallel processing occurs when more than one set of data samples are operated on at the same time. In a pipeline structure, as the first set of samples moves on to the second FFT pass, the next set of samples is being processed in the first FFT pass. However, within each pass, each group of samples must be processed sequentially by the single computational element present in that FFT pass. The pipeline architecture has an advantage over the single computational element design in that it does not require the use of memory elements to store intermediate results. Also, latency is reduced since more than one computational element is used.
Even though latency is improved with a pipeline architecture, emitter coupled logic (ECL) or gallium arsenide (GaAs) circuits must be used to incorporate high speed applications. The resulting configuration is typically not adequate for military applications. First, since available ECL and GaAs devices are low density components, the volume required by such an implementation is often excessive. Second, the design and use of custom ECL and GaAs devices are often prohibitively expensive due to the custom high speed test fixtures that must be developed to test the FFT. Third, ECL and GaAs circuits consume a relatively large amount of power.
Another method of performing FFT calculations is through "super parallelism". In a super parallel architecture, a computational element is available for each butterfly computation that is required by the algorithm. This architecture allows all of the computations within a FFT pass to be done simultaneously rather than sequentially as in a pipeline architecture. It also allows more than one set of data samples to be operated on at the same time as in the pipeline example. This method significantly reduces the computation time (latency) of the FFT. In fact, the speed may be increased so much that a slower technology such as complementary metal-oxide-silicon (CMOS) can be used to implement the algorithm. CMOS has two additional advantages: it is more compact, and it consumes less power than either ECL or GaAs.
The disadvantage of a super parallel FFT is the vast number of interconnections needed to link the computation elements together. This problem is magnified when the FFT cannot be implemented on a single module. In a multi-module design, wires and connectors must be used to link the modules together. When a large number of inter-connections must be made, the reliability of the design is severely reduced. This problem is subdued in single module designs where the butterfly connections are made within a substrate. Even though CMOS is a high density technology, the super parallel architecture makes single module design prohibitive in many military applications.
Each of the prior art implementations described attempt to increase the computational speed of the FFT. With each, latency is reduced at the cost of certain tradeoffs which make high speed FFT computation in military systems nearly impossible.