Graphical processing units (GPUs) provide high computation capabilities at lower prices than comparable central processing units (CPUs). For example, one particular GPU can compute one trillion floating point operations in a single second (i.e., one teraflop). GPUs may be provided in a variety of devices (e.g., desktop computers) and/or systems (e.g., a high performance computing center) to provide improved numerical performance.
A GPU may include a number of characteristics. For example, a GPU may include many vector processing elements (e.g., cores) operating in parallel, where each vector core addresses a separate on-device memory. There is high memory bandwidth between the on-device memories and the vector cores, and memory latency is relatively large (e.g., four-hundred clock cycles). A GPU may provide zero overhead thread scheduling (e.g., which enables algorithms with high thread counts); however, the GPU may include limited support for communications between threads. A relatively low memory bandwidth is provided between the GPU's device memory and host memory. A GPU also provides limited support for general-purpose programming constructs (e.g., code executing on the GPU cannot allocate memory itself, this must be accomplished by a host CPU).
These characteristics mean that programming for the GPU is not straightforward and highly parallel algorithms need to be created for the GPU. A typical high-level program will be hosted on a CPU that invokes computational kernels on the GPU in a sequence to achieve a result. Because of the relatively low bandwidth available to transfer data to and from the GPU's own memory, efficient programs may transfer data only when necessary.
Various technologies exist for programming GPUs. The compute unified device architecture (CUDA) is an example of a technology for programming GPUs. CUDA is a parallel computing architecture, developed by NVIDIA, which includes pre-written libraries providing fast Fourier transform (FFT) and other functionalities. CUDA provides a C-like language in which to write computational kernels for execution on NVIDIA GPUs. Other technologies for programming GPUs are being developed, such as the Open Computing Language (OpenCL) framework, Microsoft's DirectX, and NVIDIA's Parallel Nsight.