A graphics processing unit (GPU) is an integral part of a modern personal computer (PC). The GPU is a single-chip processor that is designed to accelerate the real-time three-dimensional (3D) graphics that are displayed to a user. Initially a feature of high-end graphics workstations, the GPU has found its way onto the personal computer bus as an accelerator of graphics functions for which a conventional central processing unit (CPU) was ill-suited or simply too slow.
Computer graphics began as line drawings on calligraphic displays, which were generally modified oscilloscopes. The computation for these displays required vector operations including general geometric transformations, clipping to boundaries of the display devices, and perspective transformations for 3D displays. The advent of inexpensive commodity semiconductor memory prompted the replacement of line drawing systems by raster graphics processor, which refreshed television-like displays through a frame buffer memory. Because users generally prefer to see shaded solid surfaces instead of line drawings for most applications, raster graphics quickly displaced line drawings. Instead of straight line segments, as was used in line drawings, the geometric building blocks (or primitives) for the raster graphic systems were polyhedral surfaces constructed from an array of triangles. The display primitives were a rectangular array of pixels stored in the frame buffer memory. Rows of the array correspond to the discrete scan lines on the raster scan cathode ray tube (CRT) display.
As graphics progressed from line drawings to raster graphics, the need for greater processing power led to the inclusion of mathematical co-processors on PCs. A mathematical co-processor is an integral floating point co-processor that shares the host CPU's instruction stream and has access to CPU memory. Other types of integrated co-processors are CPU extensions such as Multimedia Extensions (MMX) or Streaming SIMD Extensions (SSE), which have parallel data paths, asynchronous execution, and also have access to the CPU memory.
The demand for ever-increasing higher quality and greater realism in 3D graphics led to the need for greater graphics processing power. In order to meet this need, the GPU was introduced to perform the computationally-intensive graphics tasks. This unburdened the CPU and freed the CPU to perform other processing tasks. In its present day incarnation, the GPU is a prominent component of the PC with its own dedicated path to main CPU memory as well as its own dedicated graphics memory. In contrast to mathematical co-processors, the GPU is an autonomous special purpose processor with its own instruction streams, datapath, and dedicated memory.
Current trends in GPU design and configuration have given them larger dedicated memory, higher bandwidth to graphics memory, and increased internal parallelism. In addition, current GPUs are designed with ever-increasing degrees of programmability. With the introduction of programmability, the GPU has gained enough flexibility to find use in non-graphics applications. Furthermore, the data parallel architecture of GPUs delivers dramatic performance gains, compared to CPUs, for computationally-intensive applications. Extensions to alternative graphics algorithms and scientific computing problems have been explored in a number of instances.
Applications directed to interactive use (such as speech recognition and handwriting recognition), however, have attracted relatively little interest. One reasons for this is the implementation of these algorithms for processing by the GPU is difficult and has several limitations. For general purpose computing, GPUs are essentially stream processors with limitations. Dealing with and circumventing these limitations requires a style of programming and processing that is neither obvious nor intuitive.
These interactive use applications typically have non-trivial solutions and deal with large amounts of data. In these situations, machine learning techniques are the preferred solution techniques. Machine learning techniques operate by automatically adjusting parameters of an algorithm so that, after training, the input is correctly classified. For example, assume the task is to assign the correct ASCII label to a pixel image of an “A”. Unfortunately, training typically involves presenting hundreds of thousands of pairs (input, target) to algorithms which themselves have hundreds of thousands of operations. As a result, training can take a great deal of time, even on the fastest available machines. Testing or using the algorithm in real-world conditions can also be prohibitively expensive.