1. Field of the Invention
The present invention relates to information processing technology that can be applied to recognition processing in which an object is recognized from an image obtained by capturing said object.
2. Description of the Related Art
Conventionally there has been active research into recognition methods in which a computer learns characteristic quantities extracted from images obtained by capturing various objects, and the type of objects included in a newly input image is recognized.
There has also been research into using model information and so forth about an object to recognize not only the type of object, but also its position and orientation.
For example, in “Robust Object Detection with Interleaved Categorization and Segmentation” (IJCV Special Issue on Learning for Vision for Learning, August 2007) by B. Leibe, there is proposed a method in which characteristic points extracted from a learning image and made into a codebook are associated with characteristic points extracted from an inputted image, and the center position of an object is identified by probabilistic voting (implicit-shape-model). With this method, it is possible to identify not only the type of object, but also the position and orientation of the object.
Also, with the method disclosed in Japanese Patent Laid-Open No. 2008-257649, first a characteristic quantity is calculated for each characteristic point extracted from an input image, these are contrasted with the characteristic quantities of characteristic points calculated for a learning image, and similar characteristic points are set as corresponding points. Then, reference points are calculated for the characteristic points of the input image by using the vector from the characteristic points to the reference points calculated ahead of time with learning images having mutually different types, positions, and orientations. Then, the positions of the calculated reference points are voted on a specific image plane, and it is decided whether or not at least a specific number of the calculated reference points are present within a small region in a specific image plane. If they are present, then a learning image having vectors used in the calculation of these reference points is determined to be a learning image that is similar to the input image, and this identifies the type, position, and orientation of the object.
However, if the user tries to identify not only the type of the object, but also its position or orientation, a problem encountered with the above recognition method was that recognition was difficult when the input image was small or when there were few characteristic points that were effective for the recognition of position or orientation. Because of this, when there are a plurality of recognition categories (type, position, orientation, etc.), a recognition technique is generally used in which a plurality of stages of discriminators are used to gradually narrow down the candidates, and research has also been underway into recognition technology such as this.
A coarse-to-fine method is an example of such recognition technology. With a coarse-to-fine method, a class identified by the first stage of discriminators is set coarser than a class that is ultimately identified. More specifically, in the discrimination of the type of an object, the first stage of discriminators performs discrimination processing upon combining a plurality of type classes into a single class. Similarly, in the discrimination of the orientation of an object, the first stage of discriminators performs discrimination processing upon combining a plurality of orientation classes into a single class. That is, whatever the discrimination category may be (type, orientation), the first stage of discriminators narrows down the candidate class through coarse setting of the class, and the second and subsequent stages of discriminators are used to further narrow down the class from among this candidate class, and thereby identifies the final class.
As an example of a coarse-to-fine method, Japanese Patent 3,925,011 proposes a method in which pattern recognition is performed by using a plurality of stages of discriminators to gradually narrow down the candidate class at each stage. In the case of Japanese Patent 3,925,011, a reference pattern to be used in narrowing down the stages is decided ahead of time for every discrimination category, and these reference patterns are used to perform learning processing and produce a dictionary. The various classes here are set so that there are more reference patterns in higher-numbered stages. A candidate class is detected for an discrimination object during discrimination processing at each stage, and the candidate class closest to the discrimination object is narrowed down by using a dictionary configured to become more detailed in stages. After this, if the result of discrimination processing at each stage up to the K-th stage is the same candidate class, then that candidate class is output as the discrimination result.
Furthermore, in the case of Japanese Patent 3,447,717 an entire learning template is used to perform a rotation coarse search and narrow down the search range in template matching. Also, the orientation of an object is ultimately identified by using a partial template of a preset learning template. In Japanese Patent 3,447,717, there are a plurality of partial templates, and a characteristic portion of the object is specified in each of the partial templates. A partial template is subjected to template matching within a narrowed search range, and the final orientation is identified on the basis of the positional relation of the partial templates.
However, when any coarse-to-fine method is used, as the number of stages of discriminators increases, minute differences require the matching of templates and a decision about class, etc, so a problem is that discrimination becomes more difficult. Because of this, when a plurality of stages of discriminators are used to perform recognition processing on an object, it is necessary to raise the robustness for a candidate class that is narrowed down as the stage number of discriminators increases, so that there will be no drop in recognition accuracy.