Field of the Invention
The present invention relates to a technique of collating a person in an image.
Description of the Related Art
There is a conventionally known technique which detects a human face from each frame of a surveillance video, calculates an image feature amount from the face, stores the image feature amount in association with the video frame, and searches the stored images for a video matching a human face shown by a query image. For example, it is possible to form a query image of the face of a lost child, and find the child's face in stored images, thereby specifying the location and time from a video showing the child.
A method of registering all faces detected from all frames of a video is available, but the number image feature amounts to be stored is huge in this method. Therefore, it is possible to reduce image feature amounts to be registered by decreasing the processing frame rate by thinning frames. Since, however, the accuracy of face collation generally depends on the face direction, the probability of missed collation increases because a frame having a favorable face direction may be discarded by frame thinning.
Literature: Japanese Patent Laid-Open No. 2012-37995 is related to a face authentication system. The problems of this literature are: which image feature amount of new registration target person B must be registered with respect to preregistered person A; and how to perform high-accuracy collation. In a registration phase of this system, a moving image is captured, and, from the faces of person B extracted from a plurality of frames, a face suitable for registration and its image feature amount are determined. The face direction of person B detected from a video frame is discriminated, classification is performed based on each face direction, and similarity comparison is performed by round robin, thereby determining the threshold of a similarity for determining a face likelihood of the person. In addition, the image feature amount of the face in each face direction of person B is compared with that in the same face direction of person A. If the similarity is equal to or larger than the threshold, the image feature amount in that face direction of person B is not stored but discarded. This makes it possible to narrow down image feature amounts to those required to discriminate between the faces of persons A and B, reduce image feature amounts to be stored, and prevent deterioration of the discrimination accuracy.
It is, however, difficult to use the technique of this literature to search for a person from a surveillance video. This is so because when registering the face of a given person, the face must be collated with the faces of all already registered persons. This extremely increases the load on the registration process in a surveillance video in which a large number of persons appear.