1. Field of the Invention
The present invention relates to a Fast Fourier Transform (“FFT”) processor for use with digital-to-analog conversion circuits and the like.
2. Description of the Related Art
In general, in the descriptions that follow, I will italicize the first occurrence of each special term of art that should be familiar to those skilled in the art of integrated circuits (“ICs”) and systems. In addition, when I first introduce a term that I believe to be new or that I will use in a context that I believe to be new, I will bold the term and provide the definition that I intend to apply to that term. In addition, throughout this description, I will sometimes use the terms assert and negate when referring to the rendering of a signal, signal flag, status bit, or similar apparatus into its logically true or logically false state, respectively, and the term toggle to indicate the logical inversion of a signal from one logical state to the other. Alternatively, I may refer to the mutually exclusive boolean states as logic_0 and logic_1. Of course, as is well known, consistent system operation can be obtained by reversing the logic sense of all such signals, such that signals described herein as logically true become logically false and vice versa. Furthermore, it is of no relevance in such systems which specific voltage levels are selected to represent each of the logic states.
Hereinafter, when I refer to a facility I mean a circuit or an associated set of circuits adapted to perform a particular function regardless of the physical layout of an embodiment thereof. Thus, the electronic elements comprising a given facility may be instantiated in the form of a hard macro adapted to be placed as a physically contiguous module, or in the form of a soft macro the elements of which may be distributed in any appropriate way that meets speed path requirements. In general, electronic systems comprise many different types of facilities, each adapted to perform specific functions in accordance with the intended capabilities of each system. Depending on the intended system application, the several facilities comprising the hardware platform may be integrated onto a single IC, or distributed across multiple ICs. Depending on cost and other known considerations, the electronic components, including the facility-instantiating IC(s), may be embodied in one or more single- or multi-chip packages. However, unless I expressly state to the contrary, I consider the form of instantiation of any facility that practices my invention as being purely a matter of design choice.
Shown in FIG. 1 is a typical general purpose computer system 10. In particular, in recently-developed battery-powered mobile systems, such as smart-phones and the like, many of the discrete components typical of desktop or laptop devices illustrated in FIG. 1 are integrated into a single integrated circuit chip.
Shown by way of example in FIG. 2 is one embodiment of a single-chip audio coder/decoder (“CODEC”) 12 comprising: a plurality of digital modules; and a plurality of analog modules. In this embodiment, CODEC 12 includes a Serial Data Interface facility adapted to send data to, and receive digital data from, the system 10; a Digital Phase-Locked Loop (“DPLL”) facility adapted to determine the timing and rate relationship between two asynchronous data streams; a Configuration Memory and Control facility adapted to control which facilities are used and how, in accordance with configuration and control information received from the system 10; a Digital Signal Processor (“DSP”) facility adapted to perform various data processing activities in accordance with a stored computer program; and a Data Memory facility adapted to store, as required, audio data flowing from the system 10 to the audio output devices. I may expand on the functionality of certain of these facilities as I now explain the method of operation of my invention and embodiments thereof.
A Fast Fourier Transform is an algorithm used in many DSP applications to transform time domain data to frequency domain data, and vice versa. For example, in a CODEC, an FFT may be used to implement adaptive frequency domain filtering, such as for echo cancellation or noise cancellation. An FFT is calculated by performing multiple iterations of butterfly operations, each of which combines two or more complex data samples to produce the same number of transformed complex data samples, using complex add, subtract, and multiply operations.
Various apparatus and methods have been used to implement Fast Fourier Transform (“FFT”) butterfly operations. A radix-two FFT butterfly operation combines two complex data samples to produce two transformed complex data samples. The complex data samples can each be divided into a real part and an imaginary part, each part of which can be represented by a single data word. The memory bandwidth must be capable of both reading and writing four data words for each butterfly operation. Also four multiply or multiply/accumulate (“MAC”) operations are required, and four other add or subtract operations. This combination of fours suggests that an efficient FFT implementation with a single MAC unit will complete a butterfly every four cycles, and will require at least one add/subtract unit (“ASU”), and memory allowing at least four read and four write cycles for each butterfly. In the prior art, this memory bandwidth has been achieved by various methods, such as a memory width of two words, a multi-port memory, a double clocked memory, or two parallel data memories, for example with one memory for the real data and one for the imaginary data.
Two known butterfly operations are the radix-two decimation in time (“DIT”) and radix-two decimation in frequency (“DIF”) butterflies. The radix-two DIT butterfly is:Y0=X0+(W*X1)  [Eq. 1]Y1=X0−(W*X1)  [Eq. 2]And the radix-two DIF butterfly is:
                              X          ⁢                                          ⁢          0                =                              1            2                    *                      (                                          Y                ⁢                                                                  ⁢                0                            +                              Y                ⁢                                                                  ⁢                1                                      )                                              [                  Eq          .                                          ⁢          3                ]                                          X          ⁢                                          ⁢          1                =                              1            2                    *          W          *                      (                                          Y                ⁢                                                                  ⁢                0                            -                              Y                ⁢                                                                  ⁢                1                                      )                                              [                  Eq          .                                          ⁢          4                ]            In each of the above equations, the W, X, and Y values are complex numbers. As is known, each complex addition requires two additions, and each complex multiplication requires four multiplications and two additions or subtractions. In one embodiment, the additions and subtractions that are part of the complex multiplication can be merged with the multiplications in a MAC operation. As is known, DIT performs the complex multiplication before the complex additions and subtractions, whereas DIF performs the complex additions and subtractions first.
Some prior art has supported a pipelined butterfly implementation, completing one butterfly every 4 pipeline cycles, except for some overhead to initialize pointers and counters or change twiddle factors. Other prior art has had other logic or memory to supply the twiddle factors without requiring data memory accesses for them. While much of the prior art is focused on DIT butterfly implementations, the DIF butterfly lends itself better to an implementation that allows the datapath to also efficiently implement other DSP algorithms that involve an addition before a multiplication, such as linear phase FIR filters.
What is needed is a DIF butterfly that is more efficient and effective than the known art.