1. Field of the Invention
The present invention relates to image processing systems, and more particularly, to a learning device that generates a recognizer for recognizing a recognition target, a recognition device that recognizes, using the recognizer, whether or not a recognition image includes the recognition target, a processing method performed by the learning device and the recognition device, and a program for causing a computer to execute the processing method.
2. Description of the Related Art
In recent years, various object recognition methods based on image processing have been suggested. In particular, such methods have been dramatically improved over the past ten years. Nowadays, a technology for dividing the entire image into a plurality of small regions called local regions and for performing object recognition in accordance with local information, such as feature points and feature quantities, acquired from the local regions is becoming mainstream. The term “local region” is also called a “local descriptor”, a “component”, a “part”, a “fragment”, or the like.
As a method for realizing object recognition in accordance with such local information, an elastic bunch graph matching (EBGM) method has been suggested (see, for example, Martin Lades, Jan C. Vorbruggen, Joachim M. Buhmann, Jorg Lange, Christoph von der Malsburg, Rolf P. Wurtz, Wolfgang Konen: “Distortion Invariant Object Recognition in the Dynamic Link Architecture”, IEEE Trans. on Computers, Vol. 42, No. 3, pp. 300-311, 1993). In the EBGM method, Gabor jets are used as local information. Gabor jets in which a vector representing responses to various directions and frequencies is regarded as a feature quantity are acquired as output of direction-selective cells (oriented filters), which are said to exist in the primary visual cortex of the human brain. In the EBGM method, feature quantities at individual feature points provided by users are gathered as Gabor jets, and matching is performed using a predetermined evaluation function. As an evaluation function, a distance with respect to a point exhibiting the highest correlation in the vicinity of a feature point and the correlation value are used.
Such an idea in which the highest correlation in the vicinity of a feature point is used as a feature quantity is also adopted in an HMAX (Hierarchical Model and X) model (see, for example, Riesenhuber, M. and T. Poggio: “Hierarchical Models of Object Recognition in Cortex”, Nature Neuroscience, 2, pp. 1019-1025, 1999). The HMAX model is an object recognition method in which the visual processing system of human beings is modeled, and both the scale direction and the space direction are searched to acquire, as a feature quantity, a neuron value exhibiting the largest response. In the HMAX model, recognition is performed independent of position. Thus, the HMAX model is capable of flexibly handling pattern deviation and displacement.
However, in the above-mentioned known technologies, the contents of feature quantities acquired as local information differ depending on the type of feature quantity, and mutual compatibility between different types of feature quantities is not ensured. For example, since, normally, the dimension and scale of a vector of a feature quantity for color is different from the dimension and scale of a vector of a feature quantity for shape, these feature quantities are not compared with each other. Thus, it is difficult to utilize different types of feature quantities in order to recognize an object.