One method for detecting a target object from an object recognition image is template matching. In template matching, a model (template) of an object serving as a detection target is prepared in advance, and a two-dimensional position and orientation of the object included in an input image is detected by evaluating the degree of matching of image feature between the input image and the model. Object detection using template matching is used in various fields such as inspection and picking in FA (Factory Automation), robot vision, and surveillance cameras. Particularly in recent template matching, attention has shifted from techniques applied to detection of the position and orientation of the target object using two-dimensional measurement to techniques applied to detection of the position and orientation of the target object using three-dimensional measurement.
As template matching using three-dimensional measurement, a search processing method in which an individual template is prepared for each orientation of the target object viewed from various viewpoints and matching is performed with respect to all of the templates in turn has been proposed, but there is a problem in that the processing time for matching using the templates increases because the number of templates that have to be prepared is very large compared with template matching using two-dimensional measurement.
As a countermeasure to such a problem, template matching by a coarse-to-fine search is known. The coarse-to-fine search is one technique for speeding up search processing using template matching, and involves repeatedly performing processing to prepare a group of images (so-called image pyramid) in which resolutions are gradually differentiated, perform a coarse search using a low-resolution image, narrow the search range based on a search result, and perform a further search with a high-resolution image for the narrowed search range, before, finally, detecting the position and orientation of the target object at the original resolution (recognition of the position and orientation of the object; hereinafter simply referred to “object recognition”).
Here, FIG. 10 is a diagram showing a basic concept of a coarse-to-fine search using an image pyramid.
As shown in FIG. 10, in the coarse-to-fine search, a group of k images (image pyramid) constituted by a first layer to a k-th layer (k is an integer greater than or equal to two) in which the resolutions are gradually differentiated is used. The resolution of the first layer is the lowest, and the resolution becomes increases in the order of the second layer to the k-th layer. FIG. 10 is an example in a case where k is three, the third layer corresponds to the original image, and the resolution becomes decreases in the order from the second layer to the first layer.
In the coarse-to-fine search, firstly, search processing using template matching (comparison) on a first layer image whose resolution is the lowest is performed, and an existence position (correct candidate) of the object in the first layer is detected (refer to detection position shown in the first layer image in FIG. 10). Next, in search processing for the second layer, a second layer image corresponding to the detection position in the first layer is set as a search range, and search processing for that search range is performed (refer to detection position shown in the second layer image in the FIG. 10). In the same way, a search range in a third layer image is set based on the detection result in the second layer, search processing for that search range is performed, and, finally, the object position in the third layer (original image) is specified (refer to detection position in the third layer image in FIG. 10).
Although matching with many templates is needed in normal template matching, in the coarse-to-fine search, the number of matching of templates can be reduced by gradually narrowing the search range from the image with low-resolution (hereinafter, also referred to low-resolution image) to the image with high resolution (hereinafter, also referred to high-resolution image), and the processing time can be shortened.
Recently, for speeding up the processing of template matching using the coarse-to-fine search, a technique in which images after two-dimensional projection viewed from various camera positions (viewpoints) in each layer are compared when creating templates, viewpoints that look similar are grouped based on the similarity of these images, and the number of the templates used for matching is thinned out has been proposed (refer to European Patent No. 2048599, for example).
European Patent No. 2048599 is an example of background art.
If the above-mentioned method is employed, the number of templates that can be thinned out by grouping of the viewpoints is large in places where similar viewpoints are concentrated close to each other and the matching processing can be sped up using the templates, whereas the number of the templates that can be thinned out by grouping of the viewpoints is small in places where similar viewpoints are not concentrated close to each other, and thus accelerating the matching processing using the templates is difficult. In this way, in the above-mentioned method, a large variation occurs in the time required for matching processing between the places where similar viewpoints are concentrated close to each other and the places where similar viewpoints are not concentrated close to each other.
Also, in the above-mentioned method, it is required to successively determine whether there are viewpoints that look similar and, furthermore, if viewpoints that look similar are found, these viewpoints must be grouped, and thus much time is needed for creating templates.
One or more aspects have been made in view of the above-mentioned circumstances and aims to provide a technique for shortening the creation time of templates used for object recognition by template matching.