Real-time object detection is a challenging task in computer vision. A number of major algorithmic paradigms exist for performing object detection in two dimensional (2D) images including local 2D-descriptor based object detection processes, global 2D descriptor based (bag-of-words) object detection processes and template based object detection processes.
Local 2D-descriptor based approaches typically apply interest point detectors to detect salient points in an image, which are then characterized by a descriptor. The descriptor is matched against a database of descriptors found on the object of interest. An object hypothesis is formed if a sufficient number of such matches is found in the image. As a prerequisite, however, these methods typically require image corners or textured areas.
For objects that lack a sufficient number of image corners and/or textured areas to successfully perform a local 2D-descriptor based process, a global 2D descriptor can be utilized. A global 2D-descriptor can be formed by studying patch statistics. However, a global 2D-descriptor typically does not exploit the spatial relations of points of interest. Therefore, 2D-descriptors tends to produce a large number of false matches.
A more discriminative way of exploiting object appearance is to take spatial relations into account. This can be achieved by forming a template image of the object of interest. The template contains the relative spatial relation of information points on the object. An object hypothesis is formed in the image via template matching, which can involve sliding a template over each pixel (possibly after subsampling) and computing the similarity between an image patch and the template using a similarity metric.