Field of the Invention
The present invention relates to an image processing apparatus and image processing method suitably used to learn dictionaries of a detector for human figures or the like in particular.
Description of the Related Art
Conventionally, a method has been proposed for detecting human figures in an image taken by a camera (see, for example, Navneet Dalal and Bill Triggs “Histograms of Oriented Gradients for Human Detection”, CVPR2005), According to a technique described in the document, a dictionary of a detector is learned in advance through machine learning of human images and background images. Subsequently, the dictionary is used to identify whether or not a local image of the image received from the camera shows a human figure, and detect the human figure. However, it is known that detection performance degrades if a photography scene and personal appearance of a human figure at the time of detection differ from personal appearance at the time of preliminary learning. Specifically, the differences in the photography scene include a difference in lighting conditions, difference in a shooting direction due to differences in an installation location and angle of the camera, the presence or absence of shade, a difference in the background, and the like. On the other hand, the differences in personal appearance include differences in orientation of the human figure and clothing.
Factors which degrade detection performance include the fact that learning samples collected at the time of preliminary learning cannot cover a diversity of photography scenes and personal appearances of detection objects. Thus, to solve this problem, a technique is proposed for improving detection performance by conducting additional learning of a preliminarily learned dictionary using learning samples for additional learning collected in photography scenes similar to the photography scene used at the time of detection. Japanese Patent Application Laid-Open No. 2010-529529 proposes a method for creating a dictionary for a Real AdaBoost classifier through preliminary learning and then adapting the dictionary to additional-learning samples further through additional learning.
However, with the method described in Japanese Patent Application Laid-Open No. 2010-529529, when there are great differences in the installation angle of the camera, in attributes such as color, sex, and age of the human figures in the image, in the background, and the like between preliminary learning and additional learning, there is a great difference in feature quantity needed for identification, and thus there is a limit on improvement of identification accuracy. Consider, for example, a case in which directions and intensities of edges are used as a feature quantity for identification. If there is a difference in the installation angle of the camera with respect to a human figure between preliminary learning and additional learning, the appearance positions, angles and intensities of the edges appearing in the image of the human figure vary. In such a case, the feature quantity of the detection object learned in preliminary learning is difficult to use in additional learning, and thus there is a limit on performance improvement. Also, when there is a great difference in background texture between preliminary learning and additional learning, there is similarly a great difference in the feature quantity needed for identification, and thus there is a limit on performance improvement.