A technique for detecting an object from a video (or a still image) is called object detection. Object detection is used for the purpose of finding a desired object, which is a monitoring target, from a moving image captured by a camera, for the purpose of focusing on a particular object to increase image quality, and the like.
As an aspect of object detection, a technique called a sliding window approach is used. A concept of a process in the sliding window approach is illustrated in FIG. 15. As illustrated in FIG. 15, the sliding window approach sets a rectangular area (window) on a detection-target image. While changing a position and a size of the window, the sliding window approach evaluates whether or not a detection-target object exists in each rectangular area, by use of an evaluation function. However, this approach has a problem that it is difficult to accurately determine a position of the detection target when the detection-target object is partially hidden by another object, or when the window size is largely different from a size of the detection target.
Meanwhile, NPL 1 proposes a technique for detecting a person by use of an ensemble learning approach structured by a set of a plurality of decision trees. In this technique, decision trees are structured by a group of images each including a local position of an object, which is called a part, and score-based evaluation is made for each input image as to which part the input image is classified to. It has been reported that there is a case that, even when an object is partially hidden, the part may be highly likely detected on the basis of areas that are not hidden, since each part represents a local area. In NPL 1, a person is detected by use of an average value of scores calculated from results of recognition of a plurality of parts (FIG. 16).
NPL 2 proposes a constellation model, in which a relationship between parts is modeled as constellations. The constellation model represents a probability distribution which indicates, for each part, what kind of parameters such as appearance, relative position, rotation angle, and size make the part exist in a two-dimensional image. In NPL 2, a model based on an average value of a variance of positions and sizes of parts is generated. A likelihood for each of all combinations of part candidates, that the combination matches any constellation model is calculated. The likelihood of the combination being a background is also calculated. Then, in this method, whether the target is an object or not is determined on the basis of whether or not the ratio between the two likelihoods is higher than or equal to a threshold value.