Field of the Invention
Aspects of the present invention generally relate to a technique to detect the position of each part (component) of a target object in an image.
Description of the Related Art
Non-patent literature 1 (P. Felzenszwalb, D. McAllester, D. Ramanan, “A Discriminatively Trained, Multiscale, Deformable Part Model”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008) discusses the detection of an object and also discusses a deformable part model used for estimating the attitude of the object. In the technique discussed in non-patent literature 1, one object is represented by a combination of tree models, and a node of each tree corresponds to a thing (a part model) obtained by modeling a partial area (region) of the object and an attitude of the region (a partial attitude). Since each part model is allowed to vary in position within a previously determined range during the detection of the object and each partial attitude also becomes definite during the object detection, the estimation of the attitude of the object would be performed at the same time. Hereinafter, such a model is referred to also as a “deformable part model”.
Japanese Patent Application Laid-Open No. 2009-151445 discusses a technique to use a deformable part model. To reduce influences of a change in an object on the feature amount of an image, the technique discussed in Japanese Patent Application Laid-Open No. 2009-151445 selects and uses “a partial region with no change” that exists in learned images in common. That technique selects such a partial region in the following steps at the time of learning. The steps include (1) generating a gradient image from each normalized learned image, (2) generating a gradient average image as an average image of gradient images, and (3) selecting a partial region the center of which is a pixel having a maximum pixel value in the gradient average image.
The technique discussed in Japanese Patent Application Laid-Open No. 2009-151445 is effective for an object the shape of which is fixed to some extent and which has no large change in relative position or direction of the partial region, such as a pedestrian shot by a car-mounted camera. However, that technique cannot sufficiently deal with an object which has a large change in overall shape between objects belonging to the same category or which is analogous in overall shape but has a change in relative position or direction of the partial region, such as a person who is performing various movements. This is because the relative position of a partial region, such as the extremities, readily changes due to some differences in partial attitude, so that a partial region available for detection cannot be sufficiently selected from the learned image. Thus, it is not so effective in practice.