1. Field of the Invention.
The invention relates to computing apparatus and more particularly to a parallel-iterative processor architecture for calculating the fast Fourier transform (FFT), especially for performing transforms of large size arrays at high speeds.
2. Prior Art.
The FFT is an algorithm for computer calculation of complex Fourier series. A great number of physics and engineering problems which indicate a need for waveform analysis may be solved by directly observing the Fourier series of a portion of a waveform or by performing additional operations on this series. The FFT was developed to quickly compute the so-called discrete Fourier transform i.e., the Fourier series corresponding to a finite number of samples of a waveform. The book, "The Fast Fourier Transform", by E. Oran Brigham includes a good basic treatment of the FFT, including its history.
Although the FFT was initially designed as a computation tool for use in large or medium size computers, it was soon recognized that for high speed requirements special purpose small computers were required whose chief purpose was to calculate the FFT. The evolution of such computers has shown that these machines may be classified principally as follows: (1) pipeline processors, exemplified by the article, "A Pipeline Fast Fourier Transform", by Groginsky and Works, IEEE Transactions on Computers, Vol. C-19, No. 11, November 1970, pp. 1015-1019; (2) parallel-iterative processors, exemplified by the article, "A Fast Fourier Transform Algorithm for a Global, Highly Parallel Processor", by G. Bergland and D. Wilson, IEEE Transactions on Audio and Electroacoustics, Vol. AU-17, No. 2, June, 1969, pp. 125-127 and "The Design of a Class of Fast Fourier Transform Computers" by M. Corinthios, IEEE Transactions on Computers, Vol. C-20, No. 6, June 1971, pp. 617-623.
Pipeline processing is characterized by cascaded stages having a separate arithmetic unit and memory for each stage of the processor. Such an architecture has a speed advantage over a processor which shares a single arithmetic unit and memory. For a radix-2 FFT of N points, there are log.sub.2 N stages, or iterations, and hence this represents the speed improvement possible by pipelining.
Despite the advantages of pipelining approaches, there are a number of difficulties in constructing extremely high speed and large array size transform processors. These include: (1) the degree of parallelism cannot be increased above log.sub.2 N without either going to higher decomposition (radix) systems or using multiple parallel interconnected pipelines, (2) in order to keep the total memory requirement of the processor on the order of N words of memory, where N is the number of sample data points, the delays through the various stages of the pipeline are such that at various times data from a given input block of time samples is being processed by several processors; and (3) in order to keep the total memory requirement of the processor small and yet permit real time operation, the processor is usually designed so that the first processing unit contains 2 N words of storage, the next N words, the next N/2 words, etc. The implications of the first problem are clear. The implications of (2) and (3) are that it is difficult to construct an efficient pipelined processor, in terms of memory requirement, with identical processing units from one stage to the next. Generally, the control required, some arithmetic functions, and memory size will vary from one processing unit to the next. For large size pipelines, this creates problems in processor development, test and fault diagnosis, sparing and, hence, cost and reliability. In addition, item (2) increases the internal word size requirement since condition block scaling is prevented.
In the full parallel-iterative processor an entire stage of the FFT algorithm is performed in parallel, i.e., N/2 (radix-2) butterflies are performed in parallel, where a butterfly is the basic standard arithmetic operation in FFT computation. When the stage is complete, the resulting output data becomes the input for the next stage and the same processing unit performs the next iteration. Clearly, then, the increase in speed possible is N/2 over that possible with a processor employing no such parallelism and only one arithmetic unit with comparable speed circuitry. The difficulty with the parallel-iterative processor is the extreme complexity resulting from a degree of parallelism of N/2. A partial parallel-iterative structure reduces this complexity and avoids the problems associated with pipelined processors.