To track objects, such as people, in a sequence of images of a scene acquired by a camera, two main steps are generally required. The first step, usually referred to as detection, is to detect objects of interest in at least one image. The second step, usually referred to as tracking, is to associate objects that have been detected in or tracked in an image with candidate objects in a subsequent image. In some tracking, methods, these two steps are not separable.
One class of methods for object tracking uses a background model both to detect new objects and to track previously detected objects, without the use of a foreground model.
Another class of methods does not use a background model. For example, the method described in U.S. Pat. No. 8,131,011 uses a parts-based representation and can account for partial occlusion while tracking a person. To initialize a track, such methods either use a manually defined bounding box, or use an object detector such as a person detector, to identify a region in an image in the sequence that contains an object to be tracked. Person detectors are not very reliable in many scenarios, including indoor environments such as homes, due to a large range of poses, lighting conditions, and amounts of occlusion encountered. Furthermore, methods in this class are subject to drift, which means the tracking box tends to gradually move off of the foreground object onto a background region, after which the object is lost.
Some methods combine foreground detections based on a background model with a different method for tracking foreground objects, such as template tracking or mean-shift tracking. For example, U.S. Pat. No. 7,620,266 describes a tracking system that uses both foreground detections based on a background model, and tracking of foreground objects using a Kalman filter tracker. The tracker output is fed back to update the background model. However in that system, after a tracked region becomes stationary for a long enough time, the formerly tracked region becomes part of the background model, and the object is lost.
Another system uses a graphical model and belief propagation to combine results of the background model and an object tracker. After each image in a sequence is processed, the final foreground regions resulting from the combination are used to update both the background model and the object tracker.
In U.S. Pat. No. 7,620,266, the foreground tracking results from the current image, are not reflected in the output results from the current image. Instead, the foreground tracking results are used to update the background model to be used with future images. The output results for each image are based solely on the background model.
U.S. Pat. No. 7,929,730 discloses a method for detecting and tracking objects using spatio-temporal information that uses both background and foreground models.