1. Field of the Invention
The present invention relates to an object recognition apparatus and a dictionary data registration method and, more particularly, to a technique of recognizing an object by collating input data with data stored in advance as dictionary data.
2. Description of the Related Art
There is known a technique for registering the feature information of an object image in advance as dictionary data and recognizing the object in an input image. For example, a personal recognition method is known, which stores element data such as a face, a voice, or a fingerprint in advance as dictionary data for each of a plurality of persons to be recognized, and recognizes a person using the dictionary data. In such a personal recognition method, received element data of a person that is a personal recognition target is compared with element data stored as dictionary data, and to whom the element data of the recognition target person belongs is identified, thereby recognizing the person.
In this personal recognition method, when a face image is used as element data, the orientation of the face of the face image, the expression on the face image, and a change in the light environment (for example, the difference in contrast between front light, back light, and side light) where the face image is captured greatly affect the personal recognition accuracy. Japanese Patent No. 4057501 (to be referred to as patent literature 1 hereinafter) describes a personal authentication system that captures face images under a plurality of different light environments and stores them in dictionary data, thereby performing correct personal authentication even when the use environment (conditions) has changed.
In patent literature 1, however, it is necessary to store a plurality of face images abounding in variety in advance in the dictionary data. Since multiple conditions change in a complex manner, storing all face images corresponding to different combinations of conditions is burdensome to the user. In addition, since the number of face images to be stored increases, the processing time prolongs. To prevent this, in a method as described in Japanese Patent Laid-Open No. 2009-086926 (to be referred to as patent literature 2 hereinafter), subregions are set for a face image, and feature information is calculated for each of the set subregions to obtain feature information necessary for personal recognition, thereby improving the recognition accuracy using a small number of registered face images.
For example, personal recognition is performed in accordance with the following procedure. First, as shown in FIG. 6, a face 610 that is a recognition target face image is divided into pieces of feature information of subregions such as subregions 611, 612, and 613. The similarity between the subregion 611 and each of corresponding subregions (subregions 621, 631, and 641) of three face images (faces 620, 630, and 640) of the same person stored in dictionary data 690 is calculated. After that, the highest one of the similarities to the subregion 611 is selected. This processing is performed for all subregions of the face 610, and the selected similarities are integrated to calculate the similarities between the face 610 of the recognition target face image and the person of the three face images (faces 620, 630, and 640) stored in the dictionary data. This method makes it possible to decrease the number of face images to be stored in dictionary data for one person and perform accurate personal recognition while reducing the influence of conditions.
However, in the above-described personal recognition method, when the number of registered feature information corresponding to each person increases, a recognition error for recognizing another person as this person tends to occur more frequently. This is because the subregions in a plurality of face images of the same person stored in the dictionary data shift between the face images, or the feature of the face image of the same person largely changes. For example, when faces 701 to 704 are registered as the faces of the same person, as shown in FIG. 7, the feature of a subregion 712 of the face 702 largely changes from that of a subregion 711 of the face 701 because of the orientation of the face. Additionally, in a subregion 713 of the face 703, the detection positions of organs such as the eye and nose to determine the position of the subregion shift due to the influence of light, and consequently, the position of the subregion 713 shifts. The feature of a subregion 714 of the face 704 changes because of accessories such as a mask and glasses.
FIG. 8 is a view showing an example in which the face images of the same person shown in FIG. 7, which have shifts in the subregions, are stored in dictionary data 890, and similarities are calculated using the above-described personal recognition method. Note that faces 810, 820, 830, and 840 correspond to the faces 701, 702, 703, and 704, respectively. A subregion 801 of a face 800 shown in FIG. 8 is compared with subregions 811, 821, 831, and 841 that have shifts between the face images in the dictionary data. In this case, the similarity between one of these subregions and the subregion 801 may become high. If the same phenomenon occurs between another subregion 802 and subregions 812, 822, 832, and 842, a recognition error may occur. That is, a similarity 891 obtained by integrating the similarities between the recognition target face 800 and the faces 810, 820, 830, and 840 of another person becomes higher than a similarity 892 between the face 800 and a face 850 of the same person as that of the face 800, and a recognition error occurs.