1. Field of the Invention
The present invention relates to a recognition technique.
2. Description of the Related Art
These days, with the prevalence of digital cameras or the like, and an increase in memory capacity, it has become possible to save a large amount of captured images, resulting in inconvenience for a user to arrange the captured images. Conventionally, there has been provided a technique to arrange images by attaching, to each object, information indicating who is in an image using a face authentication technique or the like. The face authentication technique will be explained below.
As a technique for discriminating human faces, there is an image processing method of automatically detecting a specific object pattern from an image. It is possible to use such a method in many fields such as teleconferences, man-machine interfaces, security, monitor systems for tracking human faces, and image compression. As a technique for detecting faces from an image, various methods have been disclosed in Yang et al, “Detecting Faces in Images: A Survey”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 1, JANUARY 2002. The document describes a method of detecting human faces using some marked features (two eyes, mouth, nose, and the like) and a unique geometric positional relationship between these features, or using symmetric features of human faces, features of skin colors of human faces, template matching, neural networks, and the like. For example, a scheme proposed in Rowley et al, “Neural network-based face detection”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 1, JANUARY 1998 is a method of detecting a face pattern in an image by neural networks. In Schneiderman and Kanade, “A statistical method for 3D object detection applied to faces and cars”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2000), discrimination processing is executed by considering the face probability of a matching pattern as an integrated model of a statistical distribution with regard to a plurality of appearances. Furthermore, Viola and Jones, “Rapid Object Detection using Boosted Cascade of Simple Features”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'01) is an example focusing on enhancing the processing speed. In this report, while improving the accuracy of face discrimination by effectively combining many weak discriminators using AdaBoost, each weak discriminator is configured by a Haar-like rectangular feature amount, and calculation of the rectangular feature amount is performed at high speed using an integral image. Discriminators obtained by AdaBoost learning are cascaded to constitute a cascaded face detector. The cascaded face detector removes a pattern candidate which is obviously not a face using a preceding simple (that is, a calculation amount is less) discriminator, and determines whether each of the remaining candidates is a face using a subsequent complicated (that is, a calculation amount is larger) discriminator having higher identification performance. This technique does not make complicated determination for all the candidates, thereby enabling high-speed processing. Any of the above-described techniques is applied to face detection for still images, but is not applied to face detection for moving images.
Mikolajczyk et al, “Face detection in a video sequence—a temporal approach”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'01) proposes a method for predicting the state of a face in a next frame based on the face detection result of a predetermined frame, and updating the face detection result by applying face discrimination processing, as an extension of a method described in Schneiderman and Kanade, “A statistical method for 3D object detection applied to faces and cars”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2000). This method makes it possible to integrate face discrimination results in a plurality of frames, resulting in improvement in accuracy. However, the method cannot deal with the appearance of a new face, and there has been proposed, for example, a measure against this problem to execute exhaustive search for every five frames.
Japanese Patent Laid-Open No. 2005-174352, employs a method of determining a region which has not changed with time, and excluding the region from face detection processing, in order to detect faces in a moving image in real time. This method is effective at enhancing the speed. The method, however, does not integrate face discrimination results in a plurality of frames as described in Mikolajczyk et al, “Face detection in a video sequence—a temporal approach”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'01), and cannot thus be expected to improve the accuracy.
A weight of a neural network and a threshold in Rowley et al, “Neural network-based face detection”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 1, JANUARY 1998, and a parameter for defining a rectangular feature amount which a weak discriminator refers to, and an operation coefficient and a threshold for executing discrimination processing based on the rectangular feature amount in Viola and Jones, “Rapid Object Detection using Boosted Cascade of Simple Features”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'01) are generally called a recognition dictionary. The dictionary usually includes several ten to hundred KB data.
According to Japanese Patent Laid-Open No. 2005-127285, the recognition accuracy is improved by increasing, in a database to be used for individual recognition of a person, not only the number of personal recognition parameters but also that of the attributes of the database such as a schedule.
In many cases, however, it is impossible to sufficiently arrange images by only recognizing or authenticating a person. For example, this applies to a case in which objects are not only persons, a case in which identical objects are in many images, or a case in which no person is in an image such as a scenic image. To deal with these cases, extending the types of objects to be recognized is considered. By recognizing a variety of objects, it is possible to add more detailed bibliography information (metadata) to captured images, and more efficiently arrange them.
If the types of objects to be recognized are extended, however, parameter data (to be referred to as a recognition dictionary) for recognition processing is needed for each recognition target. The number of recognition dictionaries increases with an increase in the number of object types to be recognized, and thus a storage area for storing the recognition dictionaries becomes larger. Furthermore, more recognition processes are need with an increase in the number of recognition dictionaries. This increases the processing load of an apparatus, prolongs the processing time, or makes a configuration larger.
According to Rowley et al, “Neural network-based face detection”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 1, JANUARY 1998, the attributes of a schedule accompany a recognition parameter, and if there are different schedules, it is necessary to have recognition parameters, the number of which is equal to that of the schedules.