Commodity graphics processing units (GPUs) traditionally have been used for real-time, three-dimensional rendering in visualization applications and video games. Recently, however, the substantial computational power and relatively low cost of commodity graphics processors has made them increasingly attractive for more general purpose, data-parallel computations. The performance of GPUs comes from their large number of cores and high memory bandwidth, e.g., on the order of 128 scalar processors and 86 GB/s peak memory bandwidth.
As a result of these capabilities, GPUs are well-suited for a number of multimedia applications including signal processing for audio, images, and video. One component of such applications is the Fast Fourier Transform (FFT), comprising various, efficient algorithms to compute a discrete Fourier transform (DFT) and its inverse. However, while a number of FFT implementations for the GPU already exist, such implementations are limited to specific hardware, or are limited in functionality.
By way of example, one general FFT implementation for GPUs available today is the CUFFT library, which handles FFTs of varying sizes on both real and complex data. However, CUFFT is written in CUDA, a programming interface that is specific to only the most recent NVIDIA Corporation's GPUs, which are well known, but not necessarily that prevalent. To support multiple generations of GPUs from different vendors, some FFT libraries are written in the high-level shading languages found in standard graphics APIs such as OpenGL or DirectX®. However, these implementations share one significant limitation, namely that they are restricted to processing sizes that are a power of two.