There has conventionally been proposed a method of detecting a human body from an image shot by a camera (non-patent literature 1 (Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection”, CVPR2005)). In this method, human body images and background images are learnt in advance by machine learning. After that, whether a partial image of an image input from a camera is a human body is identified to detect the human body. However, it is known that when a shooting scene or the appearance of a human body is different between the time of pre-learning and the time of detection, the detection performance degrades. Examples of the difference in shooting scene are a difference in illumination condition, and differences in camera installation angle, presence/absence of the shade, and background. Examples of the difference in appearance are differences in orientation of a human body and clothes.
A factor of degradation of the detection performance is, for example, that learning samples at the time of pre-learning cannot cover the diversities of the shooting scene and the appearance of a detection target object. To solve this, there is proposed a method of improving the detection performance by performing additional learning using learning samples for additional learning that have been collected in a shooting scene similar to that at the time of detection. Patent literature 1 (Japanese Patent Laid-Open No. 2010-529529) proposes a method of creating the weak discriminator of a Real AdaBoost discriminator by pre-learning, and then adapting the weak discriminator to an additional learning sample by additional learning.
It is also known that the detection performance is improved by using, for identification, a scene-specific context obtained in a scene at the time of detection. An example of the context is the appearance position coordinates of a detection target object in an image. For a security camera whose installation position is permanent, the appearance position or size of a human body to be detected in an image has a distribution specific to the installation scene. In patent literature 2 (Japanese Patent No. 5096211), therefore, the probability distribution of the appearance position coordinates of a human body is created and used for the front filter of a discriminator or result correction. Another example of the context is a background image. The frequency at which a detection target appears at a position having a specific background texture rises depending on the camera installation location. Hence, in patent literature 3 (US20120219211 A1), not only an identification target region, but also a partial image around the identification target region are used for learning.
However, in patent literature 1, the parameters of the Real AdaBoost discriminator are only adapted to an additional learning sample. Since features used for additional learning and detection after additional learning are limited to those generated at the time of pre-learning, improvement of the performance is limited.
Patent literature 2 assumes a permanently installed camera, and only the probability distribution of the appearance position coordinates of an object is used as a context. Therefore, improvement of the performance cannot be expected in a situation in which the camera is not permanently installed or a situation in which the appearance probability of an object does not depend on the position coordinates.
In patent literature 3, only a partial image around the identification target region can be used as a context. Improvement of the performance cannot be expected in a situation in which a background image changes over time or a situation in which the appearance probability of an object does not depend on the background.