Public venues such as shopping centres, parking lots and train stations are increasingly subject to surveillance using large-scale networks of video cameras. Application domains of large-scale video surveillance include security, safety, traffic management and business analytics. In one example application from the security domain, a security officer may want to view any video feed containing a particular suspicious person in order to identify undesirable activities. In another example from the business analytics domain, a shopping centre may wish to track customers across multiple cameras in order to build a profile of shopping habits. The aforementioned surveillance applications typically require persons to be detected, tracked, matched and analysed across multiple camera views even when the persons are partially occluded by other persons or objects in a scene. This is especially true when tracking persons in crowded scenes such as shopping malls, train stations etc. Tracking a person in a crowded scene requires robust detection of the person in the presence of occlusions.
One method of detecting a target object in an image detects objects using part based models. The part based model method uses a mixture of multi-scale deformable part models. For example, a part model may refer to a model of a part of the human body for detecting objects. However, there is a high computational load associated with the method since an occluded person in the scene needs to be matched against a potentially large number of part models to obtain a matching score. Further, the part based model method can still fail in the presence of severe occlusions.
Another method of detecting a target object in an image of a scene uses a calibrated camera and specifies a width for a person to be detected. Assuming that the feet of the person is visible in the image, the width and height of a bounding box enclosing a person can be determined iteratively by evaluating symmetry within the bounding box. However, the calibrated camera method fails when the feet of the person are not observable or if the presence of occlusions affects the symmetry within the bounding box.
Another method of detecting a target object in an image use training data to learn the size of a bounding box enclosing a person in the scene. Assuming a foreground mask is available for the scene representing foreground objects (e.g. moving persons), an objective function is used to maximise a trade-off between unary confidence scores and pairwise overlap penalties. However, this training data method requires a large amount of training data to learn the bounding box size to a high degree of accuracy. Despite this training date method is not robust to severe occlusions.
A camera may be calibrated from a vertical vanishing point and a horizon line. One method of estimating a vertical vanishing point is to track a moving person in a scene. However, the persons tracked in the scene to obtain a vertical vanishing point must be fully visible in the scene, or in other words completely unoccluded.