Field of the Invention
The present invention relates to an object identification apparatus and an object identification method.
Description of the Related Art
There has been a face identification technique for identifying an individual face in pattern recognition, particularly as a technique for discriminating whether an individual subject in an image is identical to an individual subject in another image. Hereinafter in the present specifications, object identification means determining differences between individual objects (for example, differences between individual persons), while object detection means determining that objects are in the same category, without identifying individuals (for example, detecting a human face without identifying an individual person).
The identification performance of an apparatus and a method for performing the above-described pattern recognition (recognition of an object, a human face, etc. in an image) is degraded due to variations between a registration target pattern and an authentication target pattern. More specifically, the degrading factors include variations of an identification target object such as a human face, for example, illumination condition variations, attitude and orientation variations, being hidden by other object, facial expression variations, and so on. There is a problem that large variations between the registration target pattern and the authentication target pattern will remarkably degrade the identification performance.
As a technique for addressing this problem, there is a technique for focusing on local portions of an object in an image. For example, generally, when the object is a human face, effects of the above-described variations do not appear uniformly over the entire range of the face in face image data in which a certain individual is captured. For example, in the case of facial expression variations between an expressive face image and a blank face image, fewer variations are expected in the vicinity of the nose than the mouth or eyes. Similarly, in the case of illumination variations between a face image exposed to oblique light and a face image entirely exposed to uniform illumination, variations at a portion exposed to oblique light are expected to be smaller than variations at a portion not exposed to oblique light. Further, when the face looks towards the left relative to an observer, the left side of the face hides in the depth direction because of the three-dimensional shape of the face, and therefore the left side of the face provides a larger variation from the front face than a variation of the right side from the front face. Therefore, in a case where variations such as facial expression variations, illumination variations, and face orientation variations occur, even if variations of a certain local area are so large that individual identification is not possible, variations of other local areas may be such an extent that individual identification is possible. More specifically, it is considered that selectively integrating the similarities of local areas providing comparatively small variations may enable favorable individual identification.
Further, it is generally considered that including sufficient variations in the registration target pattern in advance is effective to cope with large variations. For example, in the case of a human face, it is useful to register images having illumination condition variations, aspect and orientation variations, variations when hidden, facial expression variations, etc., for each registered individual. If conditions possibly occurring in image capturing are included in registered images in advance, the improvement in recognition accuracy can be expected.
However, preparing many registered images decreases user-friendliness. Further, there is a problem that preparing variation patterns that contribute to the accuracy improvement is actually difficult.
To solve the above-described problem, Japanese Patent No. 4379459 discusses a technique for increasing the number of registered images by pseudo-generating diverse variation images from one image by using three-dimensional shapes of registered objects.
Further, [Qi Yin, Xiaoou Tang, and Jian Sun. “An Associate-Predict Model for Face Recognition.” Computer Vision and Pattern Recognition (CVPR), 2011.] discusses a technique for preparing a sufficient number of data items regarding race, gender, and age and reconfiguring an image close to an input image based on the prepared data. The prepared data is associated with data including facial orientation variations and illumination condition variations. Therefore, for example, a front face image can be reconfigured based on a side face image.
To improve the recognition accuracy, a technique for separately preparing a discriminator specialized for registered objects have been studied over the years. Since this technique performs machine learning when an image is registered, it is also referred to as online learning. Although the online learning is an effective technique for accuracy improvement, the following two major problems arise. The first problem is a problem of learning data. As described above, if user-friendliness is taken into consideration, it is desirable that the number of registered object images is as small as possible. On the other hand, to make the discriminator specialized for registered objects, a sufficient amount of learning data is required to a certain extent. The second problem is an amount of calculations. Although machine learning is generally performed by utilizing various statistical processing and numerical calculations, the processing amount becomes a problem when incorporating the data into apparatuses having limited calculation resources such as digital cameras and mobile phones. The first problem may possibly be avoided by generating variation images based on a small number of registered images, using the above-described method. However, the method makes the second problem more complicated since images are input. More specifically, the processing load increases if high dimensional data such as images are input to carry out machine learning. On the other hand, if the amount of learning data is decreased to reduce the processing load, it becomes impossible to include sufficient variations, making it difficult to perform the accuracy improvement. Because of the above-described problems, it has been impossible to effectively utilize online learning while maintaining user-friendliness.