This invention relates generally to digital signal processing techniques and, more particularly, to improvements in hardware for implementing the algorithm known as the fast Fourier transform. Fourier transformation is a well known technique for analysis of time-varying signals. In simple terms, Fourier transformation converts a signal from a time-varying one, said to be "in the time domain," to a frequency-varying one, said to be "in the frequency domain." Fourier transforms are used extensively in spectrum analysis and related applications.
When a signal is expressed in discrete form, i.e. by a series of successive signal samples taken at regular time periods, the corresponding Fourier transformation is referred to as the discrete Fourier transform, or DFT.
The basic DFT equation is: ##EQU1## where W is defined as: ##EQU2## and where: x(n)=a set of n signal samples, and
X(k)=a corresponding set of output signals comprising the DFT.
This is referred to as an N-length DFT. The fast Fourier transform, or FFT, is a set of mathematical algorithms that drastically reduce the number of arithmetic computations for processing DFTs. Using a "squared-radix" FFT algorithm, two N-length DFTs can be combined to generate an FFT of length of N.sup.2.
From equation (1), the N.sup.2 -length DFT is defined as: ##EQU3## The radix-squared DFT is derived by redefining the indices n and k as: EQU n=Nn.sub.1 +n.sub.2 for n.sub.1,n.sub.2 =0,1,2, . . . (N-1) (4) EQU k=k.sub.1 +Nk.sub.2 for k.sub.1,k.sub.2 =0,1,2, . . . (N-1) (5)
Substituting equations (4) and (5) into equation (1), and noting that ##EQU4## yields: ##EQU5##
The term in brackets is an N-length DFT on the data addressed by n.sub.1. The second summation is the N-sample DFT of the data in brackets after it has been multiplied by the twiddle factors given by ##EQU6## A block diagram of this process is shown in FIG. 1.
It will be appreciated that the radix-squared FFT can be further extrapolated to an N.sup.4 -length FFT, and beyond if desired, provided that the basic "building blocks" are available. In particular, the basic N-length DFT kernel should provide high speed and simplicity of design. A fundamental issue affecting the DFT architecture is the representation of the data. Two methods for representing numerical data have been used extensively in existing systems: bit-parallel and bit-serial arithmetic with bit-parallel arithmetic, all of the bits representing a number are presented to the computational device in one cycle; likewise, the device outputs all of the bits of the result in one cycle. Most commercially available multipliers and other arithmetic components are implemented in bit-parallel arithmetic. With bit-serial arithmetic, the bits representing the numbers are presented at the rate of one bit per cycle to the arithmetic components, and the results are also generated at the rate of one bit per cycle.
In the design of a DFT processor, the bit-parallel representation of the data is most convenient for the data permutations that must be effected outside of the DFT kernels. When the data is permuted, the bits within a single data word retain the same relationship to one another. Thus all of the bits of the data word are handled identically. For a system that uses large-scale memories to permute the data, all of the bits can be accessed with the same address. This dramatically simplifies the address generation and other control circuitry for the memories. Another advantage of bit-parallel arithmetic is that this is the format most commonly implemented in commercially available components. Thus, a DFT processor built using bit-parallel arithmetic can be more easily used in combination with components from other sources.
The bit-parallel representation has several disadvantages within the DFT kernel. Typical arithmetic operations, such as addition and multiplication, require that the results from the computation of the lesser significant bits affect the correct results for the more significant bits. Thus, if a bit-parallel operator is to complete an operation in a single cycle, the cycle must be long enough to allow intermediate results to propagate from the least significant bit to the most significant bit. Thus, the speed of a bit-parallel system can be seriously impaired for systems with large data words, i.e. for systems of high precision.
Another disadvantage of the bit-parallel approach is that it does not facilitate the use of word-parallel computations. Word-parallel computation refers to architecture in which several computations are performed on several combinations of data words each cycle. This approach allows the construction of a much faster processor than would otherwise be possible. Furthermore, because different operations are performed by different components, it is possible to design each of the components to be specialized for a specific task. This allows the components to be simpler. For example, a component that multiplies by a predetermined constant is simpler than a multiplier that must be able to multiply by an arbitrary number. With bit-parallel arithmetic, it is difficult to utilize word-parallel computation. Each data path in a bit-parallel architecture requires a number of wires equal to the number of bits in the word. Thus, for D data paths with words of B bits, there are D.times.B connections to the input of each DFT kernel and D.times.B connections to the output. It is readily seen that, for practical values of B (greater than 12), D must be quite small, or the connection problem becomes intractable, either with the DFT implemented as a VLSI (very large-scale integrated) circuit component, requiring 2DB pins, or as a circuit card with the same number of external connections to other cards.
Bit-serial arithmetic is much better suited for the computation within the DFT kernel. With B=1, word parallelism is easily exploited. Furthermore, bit-serial structures have been developed so that intermediate results must only be propagated across one bit of data each cycle. Thus, bit-serial processors may operate with very high cycle rates.
Bit-serial arithmetic presents several problems in the development of a DFT processor. Word parallelism must be exploited to obtain high data rates through the processor. Therefore, the DFT kernel outputs several data-streams simultaneously, each of which must be handled differently. This makes the control of the memories that implement the permutations between kernels much more complicated. Furthermore, each intermediate result, such as a carry, must be stored each cycle. This results in a large portion of the kernel circuitry being used for the storage of intermediate results.
One proposed solution to the difficulty of compromise between the bit-serial and the bit-parallel formats is to employ in the DFT an architecture known as bit-skew arithmetic. This approach retains a similarity with bit-parallel, in that there are as many input lines as there are bits in a data word, but the timing of the inputs is skewed by one bit per stage. At the least-significant stage, the first bit of the first word is input. Then, in the next clock cycle, the first bit of the second word is input at the least significant stage, while the first bit of the second data word is being input at the next least significant stage. This process has the advantage of preserving some of the parallelism of a bit-parallel system, while affording some of the advantages of a bit-serial system. One disadvantage is that carry signals have to be propagated from stage to stage, as in a bit-parallel system.
It is apparent that there is still a need for improvement in the area of DFT architectures. Ideally, what is needed is a system that preserves the advantages of bit-parallel arithmetic outside of the DFT kernel, but has substantially the same advantages as the bit-serial approach within the kernel. The ideal approach should also preserve the ordering of the data inputs between input and output of the DFT, to facilitate cascading of stages without the use of double buffering or complex memory addressing schemes. The present invention fulfills all of these requirements and provides additional advantages to be described.