1. Field of the Invention
The present invention generally relates to digital signal processors, and more particularly to a method and processing circuit for computing a fast Fourier transform (FFT) in a digital signal processor.
2. Discussion of the Related Art
As is known, a nonperiodic signal f(t) can be represented in terms of exponential functions over the entire real line from xe2x88x92∞ to +∞. To be valid for all time, the signal must be represented by a continuous sum of exponential functions with frequencies lying in the interval (xe2x88x92∞, ∞). The Fourier transform is a method of expressing a given signal in terms of such a continuous set of exponential components of frequency. The increasing use of digital methods for computational aids and for signal processing applications has resulted in an emphasis on discrete Fourier transforms (DFTs) that model Fourier transforms. A DFT is defined as a sequence of N complex valued samples in the frequency domain.
Computation of the DFT requires N2 multiplications and the resulting computation time becomes excessive when N becomes large. Such a computation can be represented by Ri=Ci Op Di, where Op is an arithmetic operation, C={1, 2, 3, xe2x88x923, xe2x88x922, xe2x88x921} (a symmetrical coefficient set), D={. . . Di . . . }, and R={. . . Ri . . . }. A traditional approach to the problem is to pre-store all coefficient values, Ci, in memory, for each i read Ci and Di, then solve for Ri as follows:
i=1, j=1, N=6;
loopN{read C[j], read D[i], R[i]=C[j]OpD[i], i++, j++}.
An improved approach to the problem would pre-store only coefficients Cxe2x80x2={1, 2, 3} and solve for R in two stages as follows:
i=1, j=1, N=6;
loopN/2{read Cxe2x80x2[j], read D[i], R[i]=Cxe2x80x2[j]OpD[i], i++, j++};
i=1, j=3;
loopN/2{read Cxe2x80x2[j], read D[i], R[i]=xe2x88x92Cxe2x80x2[j]OpD[i], i++, jxe2x88x92xe2x88x92}.
It is readily apparent that the improved computation saves coefficient space at the cost of program space and processing time.
The key to more efficient computational methods is to make use of as much symmetry of the complex exponential as possible before the multiplications are performed.
The symmetry of the complex exponential makes FFTs possible. A FFT is an algorithm which enables the user to compute a DFT with a minimum of computational time. The algorithms are fast because they reuse the same roots of unity many times and thus minimize the number of multiplications necessary to compute the DFT. This reuse of the roots of unity reduces the complexity of the operation to N log2 N. The net savings in computational time becomes appreciable for large N. For example, the computational time required for a one thousand twenty four point DFT with direct evaluation is one hundred times greater than that using the Cooley-Tukey FFT formulation. However, FFT algorithms require considerable memory, and memory access delay time becomes a limiting factor when high performance is desired.
In this regard, the FFT processor may generally be characterized as a digital processor which repetitively performs the basic computations:
AW+B; AWxe2x88x92B,
where A and B are complex digital words, each initially associated with a different one of N digital samples, the frequency spectrum of which is to be analyzed, and W is a complex digital word which serves as a weighting coefficient (also known as a twiddle factor). The above computations can be performed by processing such digital words in parallel, as mentioned above, using a complex multiplier to perform the AW portion of the calculation, a storage means for storing such portion of the calculation, and a complex parallel adder and subtractor for adding and subtracting the stored portion of the calculation to and from, respectively, the B portion of the calculation.
Unfortunately, in FFT processing, global memories are relatively slow and heavily loaded due to their shared nature. Some of the loading is the result of the memory required to perform the FFT evaluation. Direct evaluation of a five hundred twelve point FFT requires the storage of complex data values and complex coefficients or twiddle factors for each discrete frequency evaluated.
Accordingly, there is a desire to provide improved efficiency in storing and retrieving coefficients for computing FFTs that reduces the memory storage capacity requirements and coefficient memory access delays in the prior art.
Certain objects, advantages and novel features of the invention will be set forth in part in the description that follows and in part will become apparent to those skilled in the art upon examination of the following or may be learned with the practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
To achieve the advantages and novel features, the present invention is generally directed to a processing circuit and method for computing a FFT. The present invention reflects the recognition that memory space is at a premium and excessive reads from memory consume excessive amounts of time in waiting for the memory storage device to correctly index the coefficient memory locations required in computing FFTs. Accordingly, the circuit of the present invention is specifically designed to reduce the memory requirements for the storage of symmetrical coefficients. The suggested circuit and method enables the following computation upon pre-storing only Cxe2x80x2={1, 2, 3}:
i=1, j=1, N=6;
loopN{read Cxe2x80x2[j], read D[i], R[i]=Cxe2x80x2[j]OpD[i], i++, j++}.
This implementation has the advantage of reducing the required coefficient storage space, without the penalty in program space and processing time of the prior art implementations.
In accordance with the invention, the processing circuit includes a memory device, a multiplier, a detector, a state machine, and a circuit for performing the 2""s complement of a coefficient. The memory storage device stores data values and coefficient (or twiddle) values. The detector integrates a data pointer with the state machine. The detector is designed to identify the symmetry lines (by memory address). The state machine, when notified by the detector that a line of symmetry has been encountered, appropriately adjusts either the coefficients, the imaginary sign, or the real sign for input to a multiplier. In this way, for first order symmetry, coefficients representing only one eighth of the unit circle are required to be stored in order to enable computation of the entire FFT and the inverse FFT. For DFT coefficients with higher orders of symmetry, less of the coefficients on the unit circle need to be stored.
In accordance with a preferred embodiment of the invention, the circuit includes a detector in the form of a data pointer and a state machine driven by data address comparison. The detector determines the portion of the unity circle that the data pointer is virtually pointing to and increments the state machine accordingly. The address pointer increment function is also triggered in the process. The state machine keeps track of which xe2x80x9cslicexe2x80x9d the address pointer is in by changing states when a slice boundary is encountered. The arithmetic operation is modified by the state machine from a combination of multiplexer select signal processors and add/subtract signals. In this regard, a first multiplexer is disposed to retrieve a value to output to the second input of the first adder, wherein the retrieved value may be retrieved from the data memory, or from the output of the first adder. In similar fashion, a second multiplexer is disposed to retrieve a value to output to the second input of the second adder, wherein the retrieved value may be retrieved from various locations to route to the second adder. Specifically, this second multiplexer may route data from the data memory, the output of the first adder, or from the output of the second adder. Additional multiplexers may be provided to retrieve both real and imaginary components of data values and coefficient values to controllably route either the real portion or the imaginary portion into various arithmetic units of the circuit. In this way, complex numbers may be controllably manipulated, multiplied, or added to effectively carry out the computation of the present invention.
Further, an address controller is provided which controls the addressing of both the data memory and coefficient memory for both read and write operations. Therefore, what is provided is a more efficient circuit for processing FFTs with symmetrical coefficients.