Human detection in video streams or feeds may require knowing or determining a scene or setting's geometry in order to estimate a human height (the height or a person depicted in a stream) at any place in a video frame or video image. For fixed/static video cameras, the scene geometry may be configured manually. For some video cameras, such as pan-tilt-zoom cameras (PTZ cameras) or cameras that move around a room, the scene geometry may be unknown when the cameras move to a new arbitrary location within a scene or if the cameras perform pan, tilt, or zoom operations. For these cameras, real-time or automatic human detection may be difficult. Providing manual calibration after every camera movement may be an unrealistic solution.
Some automatic calibration methods of surveillance cameras may not be useful for crowded scenes or settings with a higher number of people in the video images. The existing solutions may rely on a background model (background subtraction), which is irrelevant for crowded scenes, where the people in the video images may occlude each other, and the background subtraction can't isolate individual objects. In addition, there are humans' reflections and shadows in many scenes, such that human size cannot be extracted by that method.
Other solutions propose automatic calibration based on three orthogonal dominant directions—straight lines created by buildings, roads, floor tiles, or walls edges. These solutions may not solve the problem of crowded scenes because three orthogonal dominant directions may not always be visible, especially by zoomed PTZ and human height may be not calculated in pixels.