Recent advances in computer performance have enabled graphic systems to provide more realistic graphical images using personal computers, home video game computers, handheld devices, and the like. In such graphic systems, a number of procedures are executed to “render” or draw graphic primitives to the screen of the system. A “graphic primitive” is a basic component of a graphic picture, such as a point, line, polygon, or the like. Rendered images are formed with combinations of these graphic primitives. Many procedures may be utilized to perform 3-D graphics rendering.
Specialized graphics processing units (e.g., GPUs, etc.) have been developed to optimize the computations required in executing the graphics rendering procedures. The GPUs are configured for high-speed operation and typically incorporate one or more rendering pipelines. Each pipeline includes a number of hardware-based functional units that are optimized for high-speed execution of graphics instructions/data, where the instructions/data are fed into the front end of the pipeline and the computed results emerge at the back end of the pipeline. The hardware-based functional units, cache memories, firmware, and the like, of the GPU are optimized to operate on the low-level graphics primitives and produce real-time rendered 3-D images.
GPU hardware-based motion-based video compression is now widely implemented. A video compression standard, H.264, supports searching for motion across a large number of reference frames and the encoding of motion vectors for macro blocks (e.g., blocks of around 16×16 pixels).
Although the H.264 motion search calculation does not present a challenge in mathematical terms, it does in the sheer number of required calculations. An object's displacement, i.e. motion, is found by computing the SAD (sum of absolute differences) between the source and the reference frames using 16×16 blocks. Because objects in the frame can move in any direction and by any amount, this search is a very computationally intensive operation.
Thus, a need exists improving the performance of searching for a winning motion vector during an encoding process.