Graphics processing units (GPUs) contain hundreds of arithmetic units and are able to provide tremendous acceleration for many numerically intensive scientific applications. The key to effective utilization of GPUs for video processing is the design and implementation of efficient data-parallel algorithms that can scale to hundreds of tightly coupled processing units. Many image and video processing applications are well suited to GPUs, due to their extensive computational requirements, and because they lend themselves to parallel processing implementations. The use of GPUs can provide tremendous speed increases over a central processing unit (CPU) in some implementations and without any compromise in final image quality.
The explosive growth of digital video content from commodity devices and on the Internet has precipitated a renewed interest in video processing technology, which broadly encompasses the compression, enhancement, analysis, and synthesis of digital video. Video processing is computationally intensive and often has accompanying real-time or super-real-time requirements. For example, surveillance and monitoring systems need to robustly analyze video from multiple cameras in real time in order to automatically detect and signal unusual events. Moreover, continued growth of functionality and speed of video processing systems will likely further enable novel applications.
Due to the strong computational locality exhibited by video algorithms, video processing is highly amenable to parallel processing. For instance, what appears on the 10th frame of a video sequence does not strongly affect the contents of the 1000th frame in time; and in space, an object on the left side of single frame does not strongly influence the pixel values on the right. Such locality makes it possible to divide video processing tasks into smaller, weakly interacting pieces amenable to parallel processing. Furthermore, these pieces can share data to economize on memory bandwidth.
When doing video encoding, a GPU will get video data from a CPU. The video data is copied from a system memory to a graphics memory on a graphics card via Peripheral Component Interconnect Express (PCIe), which may consume time and be limited by the speed of PCIe. FIG. 1 is a block diagram illustrating the framework of an existing graphics card system 100. In the graphics card system 100, a video camera 102 captures external video data, which is transmitted through a CPU 101 and stored in a system buffer 103. When it is necessary for processing, the video data is read from the system buffer 103 by a graphics processing unit (GPU) 105 through an interface 104 and then stored in a frame buffer 106 for further processing, which is customized for graphics storage. This technique is inadequate for a request of real-time and high-speed processing of video data.