The present invention relates to Fast Fourier Transforms and, more particularly, to methods of data storage that are particularly useful in VLSI implementations of in-place FFT algorithms.
Since its rediscovery by Cooley and Tukey in 1965, the Fast Fourier Transform (FFT) has found wide application in fields such as digital signal processing. A general review of the state of the art in FFTs is found in P. Duhamel and M. Vetterli, xe2x80x9cFast Fourier Transforms: a tutorial review and a state of the artxe2x80x9d, Signal Processing vol. 19, pp. 259-299 (1990), which is incorporated by reference for all purposes as if fully set forth herein.
One particular class of FFTs is the in-place Cooley-Tukey FFT. FIG. 1, which is adapted from FIG. 1 of Duhamel and Vetterli, illustrates this style of FFT. This particular example shows the implementation of a 15-point FFT as a succession of radix-5 and radix-3 Discrete Fourier Transforms (DFTs). An input sequence of 15 complex numbers x0 through x14 is stored in column order in an array 10 of 3 rows and 5 columns, as shown. The first step of the FFT is three radix-5 DFTs 12 of the three rows of array 10. DFTs 12 are performed in place: the numbers stored in each row of array 10 are read and transformed, and the transformed numbers replace the original numbers in array 10. The second step of the FFT is the multiplication, also in place, of each number now stored in array 10 by a corresponding xe2x80x9ctwiddle factorxe2x80x9d from a 3 row, 5 column array 14 of twiddle factors w that are integral powers of exp(xe2x88x922xcfx80j/15), where j is the square root of xe2x88x921. (In general, the twiddle factors are integral powers of exp(xe2x88x922xcfx80j/N), where N is the length of the FFT.) The third step of the FFT is five radix-3 DFTs 16 of the five columns now stored in array 10. This third step also is performed in place: the numbers stored in each column of array 10 are read and transformed, and the transformed numbers replace the numbers stored in array 10 prior to the radix-3 DFTs. The final output sequence of the transform, 15 complex numbers x0 through x14, is read from array 10 in row order, as shown.
As noted by Duhamel and Vetterli, most of the effort invested in optimizing the implementation of FFTs has been directed towards reducing the number of arithmetic operations performed. The net speed of an FFT implementation also depends on the speed at which numbers are retrieved from memory and stored in memory. This is particularly true in the case of very large scale integration (VLSI) implementations, in which the DFTs are performed by dedicated hardware.
There is thus a widely recognized need for, and it would be highly advantageous to have, an efficient method for storing and retrieving the numbers used in an in-place FFT.
According to the present invention there is provided a method of performing a FFT of a sequence of N=Bn numbers, where B is a power of 2 and n is a positive integer, including the steps of: (a) recursively selecting a pattern of storage locations for the Bn numbers in M in-place memories, M being a power of 2 that is less than B, wherein, if n=1, each in-place memory has storage locations for a different B/M of the B numbers, and wherein, if n is greater than 1, the pattern for storing Bn numbers is a concatenation of B the patterns for storing Bnxe2x88x921 numbers, there being B/M successive sets of the patterns for storing Bnxe2x88x921 numbers in the pattern for storing Bn numbers when n is greater than 1, the patterns, for storing Bnxe2x88x921 numbers, within each of the B/M successive sets, being mutually identical and being different from the patterns, for storing Bnxe2x88x921 numbers, of any other the set; (b) storing the numbers in the storage locations; (c) performing an in-place radix-B DFT on each of N/B groups of values stored in the storage locations; and (d) if n is greater than 1, performing a length N/B DFT on each of B groups of N/B values stored in the storage locations, each group including N/(MB) values stored in each of the in-place memories.
According to the present invention there is provided a device for performing an FFT of a sequence of N=Bn numbers, where B is a power of 2 and n is a positive integer, including: (a) M in-place memories, M being a power of 2 that is less than B; (b) a software module including a plurality of instructions for storing the N numbers in corresponding storage locations in the in-place memories according to one of a recursively derived plurality of patterns of the storage locations, the pattern for n=1 being such that a different B/M of the N numbers is stored in each in-place memories, and the pattern for n greater than 1 being a concatenation of B of the patterns for nxe2x88x921, there being B/M successive sets of the patterns for nxe2x88x921 in the pattern for n greater than w, the patterns for nxe2x88x921 within each of the B/M successive sets being mutually identical and being different from the patterns for Nxe2x88x921 of any other the set; (c) a master processor for executing the instructions, thereby storing N values in N the storage locations; and (d) at least one FFT processor for performing a radix-B FFT on the N values taken B at a time.
According to the present invention there is provided a method of providing a plurality of complex numbers of unit modulus to a computational process, including the steps of: (a) factoring each complex number into a most significant factor and a least significant factor; (b) storing the most significant factors in a most significant factor memory; and (c) storing at least a part of each least significant factor in a least significant factor memory.
According to the present invention there is provided a method of performing a FFT of a sequence of N=CBn numbers, where B is a power of 2, C is a power of 2 that is less than B, and n is a positive integer, including the steps of: (a) recursively selecting a pattern of storage locations for Bn+1 numbers in M in-place memories, M being a power of 2 that is less than B, starting from a base storage pattern, for B numbers, wherein each in-place memory has storage locations for a different B/M of the B numbers, each subsequently selected pattern for storing Bm numbers, where m is an integer greater than 1, being a concatenation of B the patterns for storing Bmxe2x88x921 numbers, there being B/M successive sets of the patterns for storing Bmxe2x88x921 numbers in the pattern for storing Bm numbers, the patterns, for storing Bmxe2x88x921 numbers, within each of the B/M successive sets, being mutually identical and being different from the patterns, for storing Bmxe2x88x921 numbers, of any other set; (b) storing the N numbers in N of the storage locations for Bn+1 numbers; (c) performing an in-place radix-C DFT on each of N/C groups of values stored in the N storage locations; and (d) performing a length Bn DFT on each of C groups of Bn values stored in the N storage locations, each group including Bn/M values stored in each of the in-place memories.
According to the present invention there is provided a device for performing an FFT of a sequence of N=CBn numbers, where B is a power of 2, C is a power of 2 that is less than B, and n is a positive integer, including: (a)M in-place memories, M being a power of 2 that is less than B; (b) a software module including a plurality of instructions for storing the N numbers in corresponding storage locations in the in-place memories according to a recursively derived pattern of storage locations for storing Bn+1 numbers, a base pattern of the recursive derivation being a pattern for storing B numbers, such that a different B/M of the B numbers is stored in each of the in-place memories, and each subsequently derived pattern for storing Bm numbers, where m is an integer greater than 1, being a concatenation of B of the patterns for storing Bmxe2x88x921 numbers, there being B/M successive sets of the patterns for storing Bmxe2x88x921 numbers in the pattern for storing Bm numbers, the patterns for storing Bmxe2x88x921 numbers, within each of the B/M successive sets, being mutually identical and being different from the patterns, for storing Bmxe2x88x921 numbers, of any other set; (c) a master processor for executing the instructions, thereby storing N values in N of the storage locations; and (d) at least one FFT processor for performing: (i) a radix-B FFT on the N values taken B at a time, and (ii) a radix-C FFT on the N values taken C at a time.
The primary embodiment of the present invention is directed at FFTs of input sequences of N=Bn numbers, where B is a power of 2. These numbers are stored in array 10 in a manner that allows efficient storage and retrieval using M random access memories that preferably are dual-ported memories but optionally are single-ported memories. A dual-ported memory is a random access device to which two values may be written simultaneously at two different storage locations or from which two values may be read simultaneously at two different storage locations. Note that, because the input and output numbers and the intermediate values are complex, each storage location includes enough room for two real values, i.e., the real and imaginary parts of the complex value stored therein. The addresses of the storage locations are provided on two data buses. These random access memories, which are used to store the input numbers, the output numbers and the intermediate values, are termed herein xe2x80x9cin-place memoriesxe2x80x9d.
At every stage in the FFT, equal numbers of complex values are stored in each of the M in-place memories, and individual complex values (single-ported case) or pairs of complex values (dual-ported case) always are retrieved simultaneously from all M in-place memories and always are stored simultaneously in all M in-place memories. The basic storage pattern, for the case n=1 (N=B1=B), has N/M=B/M complex numbers stored in each of the M in-place memories. The storage patterns for n greater than 1 are built recursively from the n=1 storage pattern.
In the basic case of n=1, one radix-B FFT (or DFT) suffices to effect the desired transform. When n is greater than 1, the overall FFT is implemented as a succession of radix-B DFTs, as described above. The row-wise DFTs of the first step are radix-B DFTs. The column-wise DFTs of the third step also are composed of radix-B DFTs. When n=2, the column-wise DFTs are radix-B DFTs. When n is greater than 2, the column-wise DFTs are built recursively from radix-B DFTs. The radix-B DFTs may be implemented in hardware, as dedicated processors, or may be implemented in software as radix-B FFTs.
More generally, the scope of the present invention includes FFTs of sequences of CBn input numbers, where B is a power of 2 and C is a power of 2 that is less than B. A storage pattern for Bn+1 input numbers is determined recursively as in the primary embodiment of the present invention, and only CBn of the storage locations are actually used, preferably in a manner that balances the loads on the memories used to store the numbers in array 10. For example, according to one preferred embodiment of the present invention, the first half of the input numbers are stored in the last CBn/2 of the first half of the Bn+1 storage locations and the second half of the input numbers are stored in the first CBn/2 of the last half of the Bn+1 storage locations. The row-wise DFTs of the first step are radix-C DFTs. The column-wise DFTs of the third step are composed of radix-B DFTs as in the primary embodiment of the present invention.
The present invention also includes a method for storing the twiddle factors.
As noted above, each twiddle factor is an integral power of exp(xe2x88x922xcfx80j/N). This integer is partitioned into a least significant part and a most significant part. The twiddle factor is the product of exp(xe2x88x922xcfx80j/N) raised to the power of the most significant part of the integral exponent and exp(xe2x88x922xcfx80j/N) raised to the power of the least significant part of the integral exponent exp(xe2x88x922xcfx80j/N) raised to the power of the most significant part of the integral exponent is called herein the xe2x80x9cmost significant factorxe2x80x9d. exp(xe2x88x922xcfx80j/N) raised to the power of the least significant part of the integral exponent is called herein the xe2x80x9cleast significant factorxe2x80x9d. Many twiddle factors share the same most significant factor or the same least significant factor. To minimize the storage devoted to the twiddle factors, the most significant factors and the least significant factors are stored separately, and are multiplied to recover the twiddle factors.
Those skilled in the art will recognize that this method for storing and using twiddle factors has applications beyond FFTs. This method is applicable to any computational process involving a plurality of complex numbers z of unit modulus, i.e., complex numbers z such that |z|=1, that can be optimally partitioned into least significant parts and most significant parts. Such applications occur in the fields of radar, communications and signal processing.
A device of the present invention includes a master processor for overall control of the device, two or more in-place memories, a read-only instruction store that includes a software model containing instructions for storing the input numbers and the intermediate values in accordance with the method of the present invention, and one or more dedicated FFT processors for performing the short DFTs of the first and third steps. Preferably, the device also includes two more read-only memories for the most and least significant factors of the twiddle factors and a complex multiplier for multiplying the most and least significant factors to produce the twiddle factors and for multiplying the intermediate values stored in the in-place memories by their respective twiddle factors in the second step.
It will be appreciated by those skilled in the art that the present invention may be used for inverse FFTs as well as for forward FFTs.