Object Tracking
Object tracking is used in many computer vision applications, such as surveillance, Stauffer et al., “Learning Patterns of Activity Using Real-Time Tracking,” PAMI, 22(8), pp. 747-757, 2000; driver assistance systems, Avidan, “Support Vector Tracking,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 2004; and human-computer interactions, Bobick et al., “The KidsRoom,” Communications of the ACM, 43(3), 2000. The wide range of objects to be tracked poses a challenge to any object tracking application. Different object representations, such as color histograms, appearance models or key-points, have been used for object tracking.
Simple object tracking finds a region in a sequence of frames of a video that matches an object. In terms of machine learning, this is equivalent to a nearest neighbor classification. The simple approach ignores the role of the background.
Filtering can be used to assign probabilities to different matches. Unfortunately, filter methods do not affect a description of the object. The description could be used to better separate the object from the background. Hence, a better approach would change the object descriptor so that the object can be distinguished from the background.
Classifiers
A strong classifier combines a set of weak classifiers. The combination can be linear or non-linear. For example, the well-known AdaBoost process trains each classifier in a set of weak classifier on increasingly more difficult training data. The weak classifiers are then combined to produce a strong classifier that is better than any of the weak classifiers, Freund et al., “A decision-theoretic generalization of on-line learning and an application to boosting,” Computational Learning Theory, Eurocolt '95, pp. 23-37, 1995. Freund presents the “boost-by-majority” algorithm that is considerably more efficient. The Freund algorithm works by calling a given weak learning algorithm WeakLearn multiple times, each time presenting it with a different distribution over the domain X, and finally combining all of the generated hypotheses into a single hypothesis. This is the boosting process. The intuitive idea is to alter the distribution over the domain X in a way that increases the probability of the “harder” parts of the space, thus forcing the weak learner to generate new hypotheses that make less mistakes on these parts.
Mean shift is a mode-seeking process that works on the gradient of a distribution to find a peak. Mean-shift searches for regions in an image that has a color histogram similar to a given color histogram. To improve performance, Comanciu et al. used spatial smoothing, Comanciu et al, “Kernel-Based Object Tracking,” IEEE Trans. on Pattern Analysis and Machine Intelligence, (PAMI), 25:5, pp. 564-575, 2003. Comanciu provides an approach toward target representation and localization. The feature histogram based target representations are regularized by spatial masking with an isotropic kernel. The masking induces spatially-smooth similarity functions suitable for gradient-based optimization. hence, the target localization problem can be formulated using the basin of attraction of the local maxima. Comanciu employs a metric derived from the Bhattacharyya coefficient as a similarity measure, and uses the mean shift procedure to perform the optimization. Comanciu states that his distance (similarity) function is smooth. Thus, his procedure can use gradient information, which is provided by the mean shift vector. The mean shift procedure finds a root of the gradient as function of location on the distance function. Thus, the mean shift procedure finds the local maximum (peak) of a scalar field of correlation coefficients. In addition, colors that appear outside the object are used to ‘down-weight’ colors that appear on the object.
Temporal integration methods, such as particle filtering, properly integrate measurements over time, Isard et al., “CONDENSATION—Conditional Density Propagation for Visual Tracking,” International Journal of Computer Vision, Vol 29(1), pp. 5-28, 1998.
The ‘WSL’ tracker maintains short-term and long-term object descriptors, Jepson et al., “Robust on-line appearance models for vision tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence 25(10), pp. 1296-1311. The descriptors are updated and reweighted continuously using an expectation-maximization (EM) process.
An incremental sub-space method continuously updates an adaptive sub-space to maintain a stable object descriptor, Ho et al., “Visual Tracking Using Learned Linear Subspaces,” IEEE Conf. on Computer Vision and Pattern Recognition, 2004.
Data mining uses concept-drift. There, the goal is to quickly scan a large amount of data and learn a “concept.” As the concept drifts, the classifier is adapted. For example, a “dynamic weighted majority” can track concept-drift for data mining applications, Kotler et al., “Dynamic Weighted Majority: A new Set Method for Tracking Concept Drift,” Proceedings of the Third International IEEE Conference on Data Mining, 2003.
Another method adds change detection to concept-drift to detect abrupt changes in the concept, Chu et al., “Fast and Light Boosting for Adaptive Mining of Data Streams,” The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2004. That method is similar to the WSL tracker of Jepson et al.
Feature selection can select from a set of different color spaces and ‘switch’ to the most discriminative color space, Collins et al., “On-Line Selection of Discriminative Tracking Features,” Proceedings of the International Conference on Computer Vision (ICCV '03), 2003. That method uses a fixed discrete histogram of features in a low-dimensional feature space to generate a confidence map for the mean-shift operation.