As computer systems have advanced, graphics processing units (GPUs) have become increasingly advanced both in complexity and computing power. GPUs are thus used to handle processing of increasingly large and complex graphics. In particular, GPUs are well suited for parallel processing. GPUs are further being used for general purpose computing tasks and in particular for computer vision tasks. Computer vision involves the processing of images to make determinations about the contents of multiple images.
Template matching is a ubiquitous operation in the field of computer vision. Template matching attempts to compute how much a region of one image matches another region of the same or a different image. The method for matching may be based on subtraction of two tiles, or on more sophisticated correlation methods. Convolution is another ubiquitous operation where a kernel of coefficients is multiplied by a tile of pixels and the results of the multiplication are summed. Convolutions are frequently used to modify the appearance of an image. For example, convolution may be used for sharpening or blurring an image.
Computation of similarity, correlation, or convolution on pixel tiles is very computation intensive. For rapid or real time matching, custom hardware may be required. However, while custom hardware is efficient, custom hardware is generally inflexible. Different applications may require computations of similarity or correlation metrics in different ways. Matching between images, within images, and with fixed patterns demands flexible hardware. Further, many advanced algorithms dynamically determine which regions to match and such flexibility is not easily accommodated by custom hardware. For example, the value of a similarity metric may be one factor in computing the cost of a match between image regions and smoothness between adjacent regions may be weighted along with normalized cross correlation to compute the total cost which is difficult to perform dynamically with fixed hardware.
A software solution can provide the necessarily flexibility and certain instructions can be used to accelerate portions of an algorithm. Unfortunately, software requires substantial overhead because every detail of the algorithm, being described using instructions, has to be for a general purpose machine, and is not optimized for the specific operation. In addition, both the algorithm and the data handling must conform to the available machine resources (execution on units, memory architecture, etc.).