Object tracking is used in many computer vision applications, see Stauffer et al., “Training Patterns of Activity Using Real-Time Tracking,” PAMI, 22(8), pp. 747-757, 2000, Avidan, “Support Vector Tracking,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 2004; and human-computer interactions, Bobick et al., “The Kids Room,” Communications of the ACM, 43(3), 2000.
The wide range of objects to be tracked poses a challenge to any object tracking application. Different object and feature representations such as color histograms, appearance models or key-points, have been used for object tracking. Feature selection can use a set of different feature spaces and ‘switch ’ to the most discriminative features, Collins et al., “On-Line Selection of Discriminative Tracking Features,” Proceedings of the International Conference on Computer Vision (ICCV '03), 2003.
Object tracking can use a Kalman filter, or a particle filter. Temporal integration methods, such as particle filtering, integrate measurements over time, Isard et al., “CONDENSATION—Conditional Density Propagation for Visual Tracking,” International Journal of Computer Vision, Vol 29(1) pp. 5-28, 1998. Filtering assigns probabilities to different matches. Unfortunately, filter methods do not affect a description of the object.
Mean-shift methods can also be used. Mean shift is a mode-seeking process that works on the gradient of distribution to find a peak. Mean-shift searches for regions in an image that has a color histogram similar to a given color histogram. To improve performance, Comanciu et al. used spatial smoothing, Comanciu et al, “Kernel-Based Object Tracking,” IEEE Trans. on Pattern Analysis and Machine Intelligence, (PAMI), 25:5, pp. 564-575. In addition, colors that appear outside the object are used to ‘down-weight’ colors that appear on the object.
Simple object tracking finds a region in a sequence of frames of a video that matches an object. In terms of machine learning, this is equivalent to a nearest neighbor classification. A simple approach ignores the role of the background. Therefore, object classifiers can be used for object tracking, see Shai Avidan, “Ensemble tracking,” IEEE Conference on Computer Vision and Pattern Recognition, pages 494-501, 2005, and Helmut Grabner and Horst Bischof, “On-line boosting and vision,” IEEE Conference on Computer Vision and Pattern Recognition, pages 260-267, 2006, and U.S. Patent Application 20060165258, “Tracking objects in videos with adaptive classifiers,” filed by Avidan Jan. 24, 2005.
The classifier based methods ‘train’ a binary classifier to distinguish the object of interest from the background in the scene. Then, the classifier is applied to a sequenced of images to locate and track the position of the object. Often, a strong classifier combines a set of weak classifiers. The combination can be linear or non-linear. For example, the well-known AdaBoost process trains each classifier in a set of weak classifier on increasingly more ‘difficult’ training data. The weak classifiers are then combined to produce a strong classifier that is better than any of the weak classifiers alone, Freund et al., “A decision-theoretic generalization on-line training and an application to boosting,” Computational Training Theory, Eurocolt '95, pp. 23-37, 1995, incorporated herein by reference.
One problem with the classifier based methods is adapting the classifier to scene changes over time while continuing to track the object correctly. It should be noted that the appearance of both the background and the object can change with time. For example, items can appear or disappear in the scene over time, and for many scenes, particularly outdoor scenes, incident lighting and shadows vary over time and space.
Conventional classifier based methods update the strong classifiers by adding and deleting weak classifiers over time to cope with the scene changes. The Avidan ensemble tracking does not try to update the weak classifiers themselves. Instead, ‘old’ weak classifiers are deleted and ‘new’ weak classifiers are added. The Grabner et al. on-line boosting method models the feature densities by Gaussian distributions. They update their parameters at each frame using a Kalman filtering methods. The method of Grabner et al. also deletes any weak classifier generating an error greater than a predetermined threshold, e.g., 0.5 or 50%. This eliminates the possibility that the deleted weak classifier could be used effectively later.
Even if Gaussian distributions are sufficient to model feature densities, which may not always be the case, the updating mechanism soon becomes complicated and slow if higher dimensional classifiers are desired.