With the rapid economic development, the role of cities is growing, and the urban population is also increasing. In order to effectively protect the safety of the urban population, it is necessary to count the number of people in places with a large human traffic such as shopping malls, railway stations and traffic intersections so that the relevant regions set effective safety plans to cope with the occurrence of emergencies and also to provide standards for limiting the human traffic in these places.
The number of people is counted based on the identification of the whole person or the identification of a certain part of the human body. In order to avoid missing detection and false detection resulted from occlusion, a camera generally vertically shoots pictures. The best human body identifying part is the head region. A conventional method counts the number of people in a manner of identifying the human head region. The existing human head identifying technology is mainly achieved by a RGB camera, in particular, extracting the shape, texture and color characteristics of the human head, and then matching the relevant images to achieve the identification of the human head. For example, a large number of human head samples are collected. A machine learning or neural network method is used to train the head samples. A training classifier is used to achieve head detection. With the popularity of deep cameras in recent years, the use of depth cameras instead of RGB cameras to identify human heads has gradually become a research hotspot. The relative distance information carried in a depth image is more conducive to the identification of the human body head than the color and texture information carried in RGB cameras. However, due to the occlusion between pedestrians and other interference factors such as light, the head detection is not high in accuracy, so that the problems such as pedestrian false detection and missing detection still occur frequently.