For ensuring safety, a management or security company of a building or a designated area utilizes a system in which an information processing device such as a computer extracts an image of a person or an object entering and leaving a gate from image information captured by a camera installed at the entrance (gate).
FIG. 20 is an example of a human extraction system which is one of such extraction systems.
In the human extraction system shown in FIG. 20, a human extraction device 90 extracts a human image from an image containing a human (image information), which is captured by a capturing device 91, and displays it on a display device 92.
As such methods of extracting a person or an object from image information, which such a human extraction device 90 utilizes, mentioned are detection methods using learning of a characteristic quantity of an image and using a classifier, or that using template matching.
Haar-like uses a characteristic quantity representing a pattern of brightness variation within an image, and detects a face by combining it with an AdaBoost classifier (for example, refer to non-patent document 1). Haar-like is suitable for extraction of a target object having characteristic brightness variation such as a human face containing eyes, a nose and a mouth.
SIFT (Scale-Invariant Feature Transform) is a method of, on the basis of a position and direction of a keypoint which is a characteristic point of an image, taking an edge direction of a region located around the keypoint as a characteristic quantity (SIFT feature) (for example, refer to non-patent document 2). SIFT is robust to rotation and variation in size, and thus suitable for extraction of objects with an identical shape. In addition, SIFT needs data for comparison.
Bag-of-Keypoints expresses an input image by a SIFT feature, and thus expresses an object using a frequency (frequency distribution: histogram) of a characteristic quantity obtained by vector-quantizing the SIFT feature (Visual word) (for example, refer to non-patent document 3). Further, Bag-of-Keypoints acquires in advance, on the basis of learning, a histogram of Visual words of the kinds (classes) of human or object. Then, Bag-of-Keypoints compares a histogram of the Visual words of an input image with the histogram of the Visual words acquired in advance, and thus classifies a human or an object into a class. Bag-of-Keypoints is suitable for extraction where the position in an image is stationary for a human or an object on which a characteristic quantity is to be extracted.
In contrast to that SIFT extracts an edge feature on the basis of a keypoint, HOG (Histograms of Oriented Gradients) is a method of extracting an edge feature with respect to a region (for example, refer to non-patent document 4). HOG is suitable for extraction of an object shape outline, compared to SIFT. However, HOG also needs data for comparison.
Template matching is a method of preparing, in advance, image information being a “template” of a shape desired to be extracted, and thus detecting a human or object resembling the template (for example, refer to non-patent document 5). The method using a template requires that image information of a human or object desired to be detected coincides with a template. Accordingly, if a human or object to compare has a plurality of appearances, the method using a template needs templates for all of the appearances.
And, as a method of human image recognition, mentioned is a method of recognizing a body region on the basis of a characteristic of a partial region of a human body (for example, of a chest) (for example, refer to Patent Document 1).
Further, as another method of human image recognition, mentioned is a method of not recognizing directly a region to be detected but estimating the desired region on the basis of another region (for example, refer to Patent Document 2). The invention described in Patent Document 2 approximates a region corresponding to a palm extracted using background subtraction by an elliptic region, determines whether the palm is a left or a right one on the basis of major and minor axes and inclining state of the elliptic region, and thus estimates a region corresponding to an elbow connecting with the palm.
The inventions described in Patent Documents 1 and 2 are suitable for extraction of a region whose shape and arrangement are stationary, such as a region corresponding to a chest or that of a palm and elbow.
[Patent Document 1] Japanese Patent Application Laid-Open No. 2006-006359
[Patent Document 2] Japanese Patent Application Laid-Open No. 2006-011965
[Non-patent Document 1] P. Viola, M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1-9, 2001.
[Non-patent Document 2] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, Proc. International Journal of Computer Vision (IJCV), 60 (2), pp. 91-110, Jan. 5, 2004.
[Non-patent Document 3] G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray, “Visual Categorization with Bags of Keypoints”, Proc. European Conference on Computer Vision (ECCV), pp. 1-22, 2004.
[Non-patent Document 4] N. Dalal, B. Triggs, “Histograms of Oriented Gradients for Human Detection”, Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886-893, 2005.
[Non-patent Document 5] Mikio Takagi, Yousuke Shimoda, “new edition Image analysis handbook”, p 1669, ISBN-10: 4130611194.