1. Field of the Invention
The present invention relates to information processing devices, recognition methods thereof, and non-transitory computer-readable storage media.
2. Description of the Related Art
In recent years, functions for tracking objects by detecting a face of a person in an image during capture have spread rapidly in digital still cameras and camcorders. These face detection and tracking functions are extremely useful technologies for automatically adapting the focus and exposure of a capture-target object. The practical applications of technologies for detecting a face in an image are advancing using techniques such as those proposed in Viola and Jones, “Rapid Object Detection using Boosted Cascade of Simple Features,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2001), (hereinafter referred to as Document 1).
On the other hand, there is a desire to use surveillance cameras in such areas as intrusion detection, movement and congestion surveillance, and the like by recognizing the faces of persons of course, and also by recognizing persons in situations where the face is not visible. Techniques for detecting human forms in images have been proposed in relation to such technologies. For example, Dalal and Triggs, “Histograms of Oriented Gradients for Human Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2005), (hereinafter referred to as Document 2), discusses techniques in which histograms of oriented gradients of pixel values are extracted from images to determine whether or not a partial region within an image is a person using the histogram as a feature amount (HOG feature amount). That is, the contours of a human form are expressed by feature amounts in the orientation of gradients of pixel values, and these are used in recognition. Furthermore, in Qiang Zhu et al, “Fast human detection using a cascade of Histograms of Oriented Gradients,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2006), (hereinafter referred to as Document 3), a method is proposed in which AdaBoost learning proposed in Document 1 is carried out using HOG feature amounts as weak classifiers, and by executing cascade type identifiers based on this, human forms are detected rapidly.
However, in the above conventional techniques aimed at detecting human forms, in a case where complicated edges are included in the background, the recognition accuracy deteriorates. This is because background edge features are also mixed in and simultaneously captured when capturing features of contour portions of a person, and thus features of human region portions cannot be separated and captured alone.
Furthermore, in the conventional examples, recognition is carried out by learning features of local regions effective in recognition using samples of person images and nonperson images. At this time, many samples of different positions, sizes, and postures of persons in images are used in learning, and different contour positions for each sample are assimilated by learning. Therefore, feature amounts of local regions ineffective for specific human images are also used in recognition.
In this way, in a case where complicated edges of background portions are included in local regions, the recognition accuracy deteriorates. This kind of phenomenon is a common issue that occurs in cases where recognition is carried out using features of contour portions of objects.