The computation of a Fast Fourier Transform (FFT) normally requires a significant amount of processing capability. Central to computation of an FFT algorithm is the requirement for vector rotation (multiplication by a unit vector). This vector rotation is typically termed the "butterfly". A Radix-2 FFT butterfly requires a complex multiplication plus a complex addition and subtraction for each pair of values. To perform the complex multiplication further requires four multiplication and two additions. All these operations must be carried out at a relatively high rate of speed while utilizing the computational circuitry to its maximum efficiency.
A wide range of architectures have been provided to implement Digital Fourier Transforms (DFT). Most of these architectures require excessive computation (e.g., chirp-z transform), excessive control circuitry (e.g., Winograd and mixed-prime-Radix algorithms), or excessive gates or I/O pins, (e.g., direct parallel Cooley-Tukey). One efficient architecture that has been employed utilizes a serial pipeline architecture which calculates the butterflies in eleven arithmetic units concurrently. This type of computation reduces the throughput requirements for each of the arithmetic units to permit use of serial arithmetic. However, the total gates-Hz product is excessive due to the pipeline architecture which requires each arithmetic unit to process at full speed half of the time and to remain idle for the other half, thus not continually processing data. The arithmetic unit can be implemented as a systolic array of adder cells with sums and carries being pipelined therethrough so that a multiplication cycle time equals the add time for a single bit cell of the array. This array approach is discussed in R. MacTaggart and M. A. Jack, "A Single Chip Array Radix-2 FFT Butterfly Architecture Using Parallel Data Distributed Arithmetic," IEEE J. Solid State Circuits SC-19, 368 (1984).
Some of the disadvantages of the pipeline structure are the amount of idle time in the pipeline when performing some of the operations. The circuitry utilized to realize the pipeline is normally faster than the circuitry utilized to realize various peripheral adders that are part of the circuitry. Therefore, there still exists some deficiencies in the above pipeline structure.