Signal processing in various engineering and scientific fields makes use of the fast Fourier transform (“FFT”). A processor can use the FFT when it reads and writes data to and from memory. Data can be written and read to and from various memories of a computer, for example, a cache memory, a random access memory (RAM), and/or a main memory. The cache of a computer is configured to store commonly used instructions and frequently accessed data. The cache is a smaller and faster than the external memory of the computer, however, the cache typically stores a fraction of the amount of data as the external memory. Cache memories can vary from computer to computer. Some cache memories allow for a cache that is completely addressable, thus allowing a software programmer the ability to access every element in a cache. General purpose processors, however, are not addressable. The contents of the cache are determined by the hardware architecture. For a general purpose processor, the prediction of the contents of the cache are determined at run-time, thus performance is non-deterministic.
Signal processors, for example general purpose processors, are designed to internally decide which data is stored in the cache and which data is stored in external memory. When the processor needs to read from or write to a location in main memory, it first checks whether a copy of that data is in the cache. If so, the processor immediately reads from or writes to the cache, which is much faster than reading from or writing to the main memory. On the modern desktop, the signal processor on average takes from about 0.5 nanoseconds to 25 nanoseconds to access the cache. Whereas the signal processor takes from about 80 nanoseconds to 250 nanoseconds to access a main memory. The penalty for a cache miss is a combination of the access time for the cache (needed to confirm the data is not in the cache), plus the access time for the main memory.
Although counterintuitive, on some occasions a smaller data signal can take a signal processor longer to process than a larger data signal, under the same computational algorithm. One of the reasons for this is that when the signal processor processes the smaller data signal, more cache misses occur than when the larger data signal is processed. For the one-dimensional (1-D) FFT it is commonly known that padding an array of data up to the nearest power of two gives the optimal average run-time. For the two-dimensional (2-D) FFT, however, padding the data signal to a power of two does not necessarily result in the optimal run-time, and can often lead to inefficient signal processing. Therefore, a need exists for a method that can accurately predict the optimal pad size of a two dimensional array of data which can be used to increase the processing speed of a signal processor by optimizing run-time for the 2-D FFT.
A disadvantage of using a power-of-two pad size can be a longer run-time for a 2-D FFT than what is achievable when using an optimal pad size. The disadvantage of using empirically determined pad sizes is that one must perform the 2-D FFT for all data sizes to determine the most efficient pad sizes, and because of issues of non-deterministic run-times in modern desktop computer CPUs, these FFTs should be repeated numerous times to average out the timing functions. Using this approach, one would need to run a large number of 2-D FFTs for all data sizes of interest, for every computational environment on which the analysis would be performed.