A fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. There are many distinct FFT algorithms involving a wide range of mathematics, from simple complex-number arithmetic to group theory and number theory.
A DFT decomposes a sequence of values into components of different frequencies. It is defined by the formula:
                                          X            k                    =                                                    ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                x                  n                                ⁢                                  ⅇ                                                            -                      ⅈ                                        ⁢                                                                                  ⁢                    2                    ⁢                                                                                  ⁢                    π                    ⁢                                                                                  ⁢                    k                    ⁢                                          n                      N                                                                      ⁢                                                                  ⁢                k                                      =            0                          ,        1        ,        …        ⁢                                  ,                  N          -          1                                    Equation        ⁢                                  ⁢        1            
This operation is useful in many fields, but computing it directly from the definition is often too slow to be practical. An FFT is a way to compute the same result more quickly. The difference in speed can be substantial, especially for long data sets where N may be in the thousands or millions—in practice, the computation time can be reduced by several orders of magnitude in such cases, and the improvement is roughly proportional to N/log(N). This huge improvement made many DFT-based algorithms practical; FFTs are of great importance to a wide variety of applications, from digital signal processing and solving partial differential equations to algorithms for quick multiplication of large integers.
However, the calculation of multi-dimensional FFTs may still pose a number of difficulties. One such difficulty is the efficient access of the data in various dimensions. An efficient means of storing and accessing data representing multiple dimensional arrays of values, for example two dimensional (2D) data from digital images, or three dimensional (3D) data from a series of digital images making up a video stream is desirable to quickly calculate FFTs within such data along multiple dimensions and/or axes.
Further, in many cases, such as phase plane correlation (PPC) of video images, it may be desirable to perform compound functions on the data, possibly over multiple dimensions. For example, in PPC it is desirable to be able to rapidly perform complex multiplication of data from sequential images, followed by FFTs over all of the rows and columns of the images.
SIMD processors are well suited to performing the same operation on multiple elements of data simultaneously. Typically, parallel processing portions of a single arithmetic logic unit (often viewed as individual parallel ALUs) may operate on portions of operands simultaneously.
SIMD architecture is generally well known, and described in John L. Hennessy, David A. Patterson, David Goldberg, Computer Architecture: A Quantitative Approach (Morgan Kaufmann, 2003)—ISBN: 1558605967, the contents of which are hereby incorporated herein by reference.
Specialized SIMD processors are particularly well suited for operating on data representing video. Processing of video, in turn, requires numerous specialized calculations.
Known media processors and digital signal processors typically require multiple processor clock cycles to perform separate instructions such as complex multiplication and FFTs. Further, the access of data in multiple dimensions stored in standard random access memory schemes may take additional clock cycles.
A memory storage method capable of more efficiently accessing multi-dimensional data across various axes and a processor capable of efficiently determining complex multiplication and FFT functions within such multi-dimensional data sets would be desirable.