1. Field of the Invention
The present invention generally relates to image processing, and particularly, a method and an apparatus for detecting persons.
2. Description of the Related Art
Although detection of a person in the field of machine vision has been researched for a long time and great progress has been made, such technology still cannot satisfy the practical requirements. One important reason is that the detection effects of many conventional detection methods are significantly reduced due to the shielding in a crowded environment. According to statistics, a detection rate of persons may reach 95% in a normal uncrowded scene, and the detection rate of persons is often less than 70% in a crowded scene.
For the problems of an undesirable detection effect in a crowded environment, some research has been performed and some solutions have been provided. As a main solution, detection is performed using a Deformable Parts Model, and the core concept of the solution is to respectively perform training and detection for parts of a human body. As an example of this solution, in the article of PAMI 2012, for which the author is A. Mohan, and the title is “Example-based object detection in images by components”, a 2-stages method is provided. Such a method includes respectively performing training for detectors of the head, the arms and the legs, and using the detectors determining whether a roughly detected result matches a corresponding mode. This method is relatively robust for the shielding, however the detection performance may decrease when a detection scene is a scene that is not included in a training set.
Another solution is a density estimation method based on regression. For example, as an example of this solution, in an article published in 2008, for which the author is Wenhua Ma, and the title is “Advanced Local Binary Pattern Descriptors for Crowd Estimation”, the number of persons is estimated by learning an image bottom-layer feature and mapping of crowd density. However, this method can estimate only the crowd density, and cannot obtain position information of persons.