In a surveillance camera system of recent years, person detection technology is employed in order to perform automatic detection of an intruder to a shop or the like and automatic counting or the like of a number of visitors. As a person detection technology of the related art, for example, those disclosed in Patent Document 1 or Non-Patent Documents 1 and 2 are known.
In the related art, Patent Document 1 discloses a method of performing a matching process between an input image that is imaged by a surveillance camera or the like and a background image database, and estimates the number of people according to the position and number of pixels detected by difference detection. In the related art, Non-Patent Document 1 discloses a method of extracting a Histogram of Oriented Gradient (HOG) feature value from a large number of detection target samples (images containing the whole body of a person) and non-detection target samples (images not containing the whole body of a person) in advance and modeling the boundary lines of each of the sample groups in the feature space using an Support Vector Machine (SVM), thereby realizing person detection technology from within static images. In the related art, Non-Patent Document 2 discloses a method of configuring a portion detector which detects a portion of a person, using an Edgelet feature value and a boosting algorithm, and combining the output results of each of the portion detectors into one by performing maximum a posteriori probability estimation.
The person detection from within a static image performed in Non-Patent Documents 1 and 2 is generally performed as follows. First, since the size and the position of a person within the input image imaged by a surveillance camera or the like is not fixed, the input image is converted into a pyramid image by repeatedly performing a scaling process of a resize process or the like in relation to the input image. Subsequently, an image window of a predetermined size is extracted from each of the scaling images which configure the pyramid image at a predetermined interval, and the person detection determination process is performed using threshold value determination of a score on the basis of a predetermined value of a feature value within each of the image windows.
Since the person detector is configured to achieve a robust (strong) detection function in relation to various shape variations of a non-rigid body such as a person, when the person detection result is displayed within the input image as a person detection candidate rectangular frame, as in FIG. 11, a plurality of person detection candidate rectangular frames 110a are output in the periphery of a person (hereinafter referred to as the person detection result positional shift issue). Finally, by performing a combination process on the person detection candidate rectangular frames 110a to determine whether they are of the same group on the basis of the property information of each of the rectangles (the center location, the size, the score and the like), a final combined result 111 is displayed as the person detection result (refer to FIG. 5.1 of Non-Patent Document 1). In a case in which people counting or the like is performed, the number of the final combined results 111 of the rectangles is counted as the number of people.