1. Field of the Invention
The present invention is directed to a software implemented Fast Fourier Transform (“FFT”) and, more particularly, to single instruction multiple data (“SIMD”) techniques for performing FFT butterfly operations.
2. Related Art
The Fast Fourier Transform (FFT) is a well-known algorithm, commonly used to translate between two complementary representations of sets of discrete data. FFT is described by Proakis, J. G. & Manolakis, D. G. in Digital Signal Processing, New York, Maxwell Macmillan, 1992, Chapter 9, ISBN 0-02-946378, incorporated herein by reference. The FFT is commonly used in communications systems to convert between time and frequency domains, in both directions. For example, it is widely applied in the implementation of discrete multi-tone (DMT) modulation and de-modulation. An inverse FFT is used at a transmitter to convert the data values to be modulated (represented as complex amplitudes of distinct component frequencies) into a sequence of points in the time domain which will form the basis of an analog signal subsequently transmitted. At the receiver, the reverse process uses a forward FFT to recreate the frequency-domain version of the signal, which is then decoded to derive the communicated data values. The term FFT is used generically herein to refer to both forward and inverse versions of the FFT.
Single instruction multiple data (SIMD) describes a style of digital processor design in which a single instruction is issued to control the processing of multiple data values in parallel (all being processed in the same manner) such as may be held in the form of an array or arrays. The FFT generally operates on arrays of complex (two-component or two-dimensional) data, and therefore, is potentially suited for implementation on SIMD processors. In its “radix-2” formulation, the FFT is based on a primitive operation known as a “butterfly” (which derives its name from its shape in a graphical representation). An FFT butterfly operation takes as inputs two complex data values, combines them arithmetically with a third constant complex value (referred to as a twiddle factor), and produces two outputs that are also complex values. In a radix-2 FFT, the computation for an array of N=2S complex elements proceeds through a series of S stages, each of which involves N/2 butterfly operations. At each stage, complex values in the input array to that stage are combined in pairs by FFT butterfly operation to produce new values in an output array for that stage. The selection of which values in the input array are paired up to produce each pair of output values varies stage by stage in a regular way.
In older designs for transmission systems using DMT (such as a digital subscriber line (DSL) modem) which are in general more hardware oriented, the FFT function used in both transmitters and receivers is typically performed by fixed-function logic circuits. However, such system designs are harder to adapt for varying application requirements. For example, different versions of DSL use different numbers of frequencies and consequently different numbers of points in the time-domain, to be handled by the inverse and forward FFT functions. While it is possible to design hardware circuits to cope with this variability, it is more complex and hence more expensive to implement. In order to increase flexibility in modem development and application, it has become more common to use software to perform the various functions in a DMT-based transmitter, receiver, or modem, especially in the case where one processor handles the operations for multiple independent channels (e.g., in a multi-line DSL modem in the central office). With increasing pressure for greater integration and performance, and therefore more channels to be handled per processor and/or larger sized FFT operations, it becomes necessary to improve the efficiency of FFT processing in such software-based DMT devices.
Software-based FFT computations are computationally expensive. For example, in an existing 2-way long instruction word (LIW) SMID processor used in an asymmetric digital subscriber line (ADSL) modem, inverse FFT routines apply regular DSP-style instructions, in particular the “Multiply-Accumulate” (MAC) instruction used in SIMD format, to implement FFT butterfly operations. For data represented at 16-bits precision for each component of the complex data format, the core butterfly step requires ten SIMD arithmetic instructions, represented in five instruction words (using the 2-way LIW characteristic of the processor) and taking five cycles to be issued, for every four butterfly operations performed. For a “256-tone” inverse FFT (i.e., S=8, thus N=28=256), the full complex inverse FFT function requires around 1,600 cycles. The equivalent code for the case of 512 tones (distinct frequencies) would require around 3,600 cycles on the same processor, because the computational cost of the FFT (number of arithmetic operations performed, in general proportional to the number of butterfly operations performed) is generally proportional to N log2 N.
An FFT computation, based on the radix-2 butterfly operation, can therefore represent a significant proportion of the total computational cost for a software-based DMT transmitter, receiver or modem, especially in the case where one processor handles the operations for multiple independent channels (e.g., in a multi-line DSL modem in a central office). With increasing pressure for greater integration and performance, and therefore more channels to be handled per processor and/or larger-sized FFT computations, it becomes necessary to improve the efficiency of FFT processing in such software-based DMT devices.
What are needed, therefore, are more efficient methods and systems for performing FFT computations.