Field of the Invention
The present invention relates to an image processing method and apparatus for detecting a predetermined object in a moving image.
Description of the Related Art
In recent years, a system has been proposed in which video shot by a monitoring camera is analyzed to detect whether or not a person intrudes into a monitoring area, and a detection result is reported. In addition to the detection of the intrusion, a system has also been proposed in which people that have passed through the monitoring area during a predetermined period are counted by tracking people displayed on a screen, or a degree of congestion is detected from the number of people counted.
To realize the above-described application, a person needs to be automatically detected from the monitoring camera video to be tracked. A method of detecting the person from the image includes, for example, a method proposed by Dalal and Triggs. Histograms of Oriented Gradients for Human Detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. According to the proposed method, a histogram in a gradient direction of a pixel value is extracted from the image, and it is determined by using the histogram as a feature amount (HOG feature amount) whether or not a partial region in the image is a person. That is, a human outline is represented by the feature amount corresponding to the gradient direction of the pixel value to be used for the recognition.
However, to apply the above-described human detection technology to an actual monitoring camera video, a detection rate needs to be improved. In general, according to an object detection technology represented by the human detection technology, if the detection rate is intended to be increased, more misdetections occur (an object that is not a detection target is detected as an erroneous report), which is not ideal for practical use. Therefore, a method of improving the detection rate while the misdetections are suppressed is demanded.
As a solution to the above-described problems, a method of tracking a once-detected person in subsequent time-series images and continuously detecting a person in parallel to improve an apparent detection rate is conceivable. This method will be described with reference to FIG. 2. In FIG. 2, a rectangle indicated by a solid line on the image represents a human detection result, and a rectangle indicated by a broken line represents a tracking result. As illustrated in FIG. 2, a result detected at a time t is tracked in images at a time t+1 and a time t+2. Similarly, a result detected at the time t+1 is tracked in the subsequent image at the time t+2, and furthermore, a person is detected at the time t+2. In this example, a detector detects only two people among five people from each image, but all people are eventually detected and tracked by integrating the detection results with the tracking results.
In the above-described parallel use of the detection and tracking processing, according to Japanese Patent Laid-Open No. 2014-48702, a window where an object is detected in a previous frame is set as a tracking target, and also an object is searched for in part of regions obtained by dividing a current frame, so that the window where the object is newly detected by the search is added to the tracking target.
The tracking target is processed by using an object discriminator. When the discriminator is used for the tracking processing, the once-detected target can be reliably tracked without losing sight of the target.
However, even when the human tracking is performed by using the method disclosed in Japanese Patent Laid-Open No. 2014-48702, the detection rate is demanded to be further increased. For example, it is also conceivable that a threshold of the discriminator in the human detector is changed to improve the detection rate. In this case, a problem occurs that the number of misdetections is increased, and the erroneously detected target is kept tracked.