1. Field of the Invention
The present invention generally relates to digital signal processors, and more particularly to a processing circuit for computing a fast Fourier transform (FFT) in a digital signal processor.
2. Discussion of the Related Art
As is known, digital signal processors (DSPs) are used in a wide variety of practical applications. Although circuit architectures may vary from chip to chip, DSPs are generally characterized by a multiplier component. As is known, multipliers perform the multiplication operation at an extremely high rate of speed (often within a single clock cycle). In comparison, a typically microprocessor architecture, which contains shifters adders and accumulators, performs a number of shift, add, and accumulate operations to carry out a multiplication operation. This manner of performing a single multiplication operation requires a relatively large number of clock cycles. As a result, arithmetic computations requiring many multiplication operations are preferably performed with a DSP.
As merely one example, DSP chips are used in electronic communications, and virtually all modems include an on-board DSP chip. As is known by those skilled in the communications art, the coding, filtering, error-correction, and other processes associated with electronic communications all demand relatively extensive mathematical computations. In order to achieve the desired speed for communicationsxe2x80x94and the faster, the betterxe2x80x94DSP chips are used to perform this processing.
The FFTs are based on the discrete Fourier transforms. The algorithms are fast because they reuse the same roots of unity many times and thus minimize the number of multiplications. This reuse of the roots of unity reduces the complexity of the operation to N log N. Typical FFT algorithms achieve the decrease in complexity over the discrete Fourier transform algorithm by using these roots of unity and storing the intermediate values in global memory. The stored values are retrieved rather than explicitly using a multiplication to calculate them.
In this regard, the FFT processor may generally be characterized as a digital processor which repetitively performs the basic computations:
AW+B; AWxe2x88x92B,
where A and B are complex digital words, each initially associated with a different one of N digital samples, generally of the radar video signal the frequency spectrum of which is to be analyzed, and W is a complex digital word which serves as a weighting coefficient (also known as a twiddle factor). The above computations would be performed by processing such digital words in parallel form, as mentioned above, using a complex multiplier to perform the AW portion of the calculation, a storage means for storing such portion of the calculation, and a complex parallel adder and subtractor for adding and subtracting the stored portion of the calculation to and from, respectively, the B portion of the calculation.
Unfortunately, such algorithms often do not work well for low energy consumption implementations due to the global nature of the shared memory required for storage and lookup of the intermediate results. Current technology employs two approaches for architecting FFTs for high performance or low energy consumption. A complex switching network, called a butterfly network, is employed to forward results between parallel functional units in a pipelined manner. One obstacle to low energy consumption and higher performance relates to the memory architectures used to store and forward intermediate results. Global memories are notoriously slow and heavily loaded due to their shared nature. More significantly, however, the large number of intermediate reads and writes that are made to memory devices leads to increased power consumption.
Accordingly, there is a desire to provide an improved architecture for computing FFTs that overcomes these and other related shortcomings of the prior art.
Certain objects, advantages and novel features of the invention will be set forth in part in the description that follows and in part will become apparent to those skilled in the art upon examination of the following or may be learned with the practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
To achieve the advantages and novel features, the present invention is generally directed to a processing circuit for computing a FFT. The present invention reflects the recognition that excessive reads to and writes from memory consume excessive amounts of power. Accordingly, the circuit of the present is specifically designed to minimize the number of reads and writes to memory. In addition, the circuit is designed so that processing parallelism may be achieved in order to reduce the total number of clock cycles required to compute a FFT.
In accordance with one aspect of the invention, the processing circuit includes a data memory for storing data values, and a separate coefficient memory for storing coefficient (or twiddle) values. The circuit further includes a multiplier configured to multiply values received from the coefficient memory and another value retrieved from some other location. The circuit further includes a first adder configured to add a value output from the multiplier with a value retrieved from another location. The circuit further includes a second adder configured to add a value retrieved from the data memory with a value retrieved from another location. Finally, the circuit includes write-back data path disposed between the second adder and the data memory. The write-back data path is configured to write data output from the second adder to the data memory, to a location where a data value was previously retrieved.
In accordance with a preferred embodiment of the invention, the circuit includes a plurality of multiplexers that are configured to controllably route data from one location to another. In this regard, a first multiplexer is disposed to retrieve a value to output to the second input of the first adder, wherein the retrieved value may be retrieved from the data memory, or from the output of the first adder. In similar fashion, a second multiplexer is disposed to retrieve a value to output to the second input of the second adder, wherein the retrieved value may be retrieved from various locations to route to the second adder. Specifically, this second multiplexer may route data from the data memory, the output of the first adder, or from the output of the second adder. Additional multiplexers may be provided to retrieve both real and imaginary components of data values and coefficient values to controllably route either the real portion or the imaginary portion into various arithmetic units of the circuit. In this way, complex numbers may be controllably manipulated, multiplied, or added to effectively carry out the computation of the present invention.
Once a given data value has been computed for a given stage in the FFT computation, the computed data value is written back to the data memory of the processing circuit. In this regard, a write-back data path is provided to route the data from the output of the second adder back to the data memory. Preferably, at least one FIFO is disposed within this write-back data path to facilitate parallelism in this process. Further, an address controller is provided which controls the addressing of both the data memory and coefficient memory for both read and write operations. Thus, data written back to the data memory by way of the write-back data path is synchronized by the address controller so that it is written at times when data values are not being read out from the data memory. Therefore, what is provided is a highly efficient circuit for processing FFT computations.