Discrete Fourier Transform (DFT) is one of the most commonly used and vitally important functions with a vast variety of applications including, but not limited to digital communication systems, image processing, computer vision, biomedical imaging, and reconstruction of 3D (e.g., tomographic) densities from 2D data. Fourier image analysis simplifies computations by converting complex convolution operations in the spatial domain to simple multiplications in the frequency domain. Due to their computational complexity, DFTs often become a computational constraint for applications requiring high throughput and near real-time operations. The Cooley-Tukey Fast Fourier Transform (FFT) algorithm (NPL No. 1), first proposed in 1965, reduces the complexity of DFTs from O(N2) to O(N log N) for a 1D DFT. However, in the case of 2D DFTs, 1D FFTs have to be computed in two-dimensions, increasing the complexity to O(N2 log N), thereby making 2D DFTs a significant bottleneck for real-time machine vision applications (NPL No. 2).
There are several resource-efficient, high-throughput implementations of 2D DFTs. Many of these implementations are software-based and have been optimized for efficient performance on general-purpose processors (GPPs), examples of which are: Intel MKL (NPL No. 3), FFTW (NPL No. 4) and Spiral (NPL No. 5). Implementations on GPPs can be readily adapted for a variety of scenarios. However, GPPs consume more power and are not ideal for real-time embedded applications. Several Application Specific Integrated Circuit (ASIC)-based implementations have also been proposed (NPL No. 6), but since it is not easy to modify ASIC implementations, they are not cost-effective solutions for rapid prototyping of image processing systems. Due to their inherent parallelism and re-configurability, Field Programmable Gate Arrays (FPGAs) are an attractive target for accelerating FFT computations, since they fully exploit the parallel nature of the FFT algorithm itself. There have been several high-throughput FPGA-based implementations over the past few years. Most of these implementations rely upon repeated invocations of 1D FFTs by row and column decomposition (RCD) with efficient use of external memory (NPL Nos. 2, 7, and 8). Many of them achieve real-time or near real-time performance (i.e., greater than or equal to 23 frames per second for a standard 512×512 image).
While calculating 2D DFTs, it is assumed that the image is periodic, which is usually not the case. This non-periodic nature of the image leads to artifacts in the Fourier transform, which are known as edge artifacts or series termination errors. These artifacts appear as several crosses of high-amplitude coefficients in the frequency domain as seen in NPL Nos. 9 and 10. Such edge artifacts can be passed to subsequent stages of processing and may lead to critical misinterpretations of results in biomedical applications. None of the current 2D FFT FPGA implementations address this problem directly. These artifacts are often removed during pre-processing, using mirroring, windowing, zero padding or post-processing, e.g., filtering techniques. These techniques are usually computationally intensive, involve an increase in the image size, and often also tend to modify the transform. The most common approach is by ramping the image at corner pixels to slowly attenuate the edges. Ramping is usually accomplished by an apodization function such as a Tukey (tapered cosine) or a Hamming window, which smoothly reduces the intensity to zero. Such an approach can be implemented on an FPGA as a pre-processing operation by storing the window function in a Look-up Table (LUT) and multiplying it with the image stream before calculating the FFT (NPL No. 10). Although this approach is not extremely computationally intensive for small images, it inadvertently removes necessary information from the image which may have serious consequences if the image is being further processed with several other images to reconstruct a final image, which is used for diagnostics or other decision critical applications. Another common method is by mirroring the image from N×N to 2N×2N. Doing so makes the image periodic, thereby removing edge artifacts. However, this not only increases the size of the image four times, but also makes the transform symmetric which generates an inaccurate phase component.
Most of the previous RCD based 2D FFT FPGA implementations have two major design challenges: 1) the 1D FFT implementation needs to have a reasonably high-throughput and be resource efficient; and 2) external DRAM needs to have a high-bandwidth and be efficiently addressed because images are usually large and intermediate storage is required between row and column 1D FFT operations.
Simultaneously removing the edge artifacts while calculating 1D, 2D or multidimensional FFT imposes an additional design challenge, regardless of the method used. However, these artifacts must be removed in applications where they may be propagated to next levels of processing.