Object tracking is performed in many computer vision applications, such as surveillance, robotics, human computer interaction, vehicle tracking and medical imaging. Object tracking locates region of pixels in a sequence of images that matches a model of the moving object. In object tracking, the camera can be static or moving.
Tracking can be considered as an estimation of a state for a time series state space model. The problem can be formulated in probabilistic terms. Tracking methods have used Kalman filters to provide solutions that are optimal for a linear Gaussian model.
Mean Shift Tracking
A common object tacking method for images acquired by a static camera uses a mean shift procedure, which relies on a nonparametric density gradient estimator to locate a window of pixels in a current image that is most similar to a color histogram of the object. The mean shift procedure iteratively uses a kernel based search starting at a previous location of the object, U.S. patent application Ser. No. 11/097,400, filed by Porikli et al., on Mar. 1, 2005, “Tracking objects in low frame rate video,” The success of the mean shift highly depends on the discriminating power of the histograms that model a probability density function of the object. The mean shift can be extended to track moving objects, which are changing in size, R. Collins, “Mean-shift blob tracking through scale space,” Proc. IEEE Conf. on Comp. Vision Patt. Recog., pages 234-240, 2003. Color histograms are common models of nonparametric density, but histograms disregard the spatial arrangement of the feature values. Moreover, histograms do not scale to higher dimensions due to exponential size and sparsity.
Particle Filter Tracking
A particle filter estimates a sequence of hidden variables (particles) based on observed samples. The particle filter, also known as the sequential Monte Carlo method, can be used for object tracking. In this case, the hidden variables are the locations of the object, and the observed samples are image pixels. Particle filtering tracks the locations over time, typically by recursively constructing a multi-model probability distribution function (pdf) based on the samples, using Monte Carlo integration. When applied to tracking in computer vision applications, particle filtering is known as condensation, M. Isard and A. Blake, “Condensation—conditional density propagation for visual tracking,” Int. J. Computer Vision, 29:5-28, 1998.
One method applies Rao-Blackwellization to integrate subspace representations in a particle filter framework, Z. Khan, T. Balch, and F. Dellaert, “rao-Blackwellized particle filter for eigentracking,” Proc. IEEE Conf. on Comp. Vision and Patt. Recog., 2:980-986, 2004. They track an unmarked honey bee in a hive.
Subspace representations have been used successfully for tracking by finding a minimum distance from the tracked object to the subspace spanned by training data or previous tracking results. The particle filter is based on random sampling that becomes a problematic issue due to sample degeneracy and impoverishment, especially for higher dimensional representations. Keeping adding tracking results to the subspace will inevitably update the subspace with inaccurate tracking results. A particle tracker is prone to fail due to the contamination of the model subspace.
Classifier Based Tracking
Tracking can also be considered as a classification problem, see United States Patent Application 20060165258, “Tracking objects in videos with adaptive classifiers,” filed by Avidan on Jan. 24, 2005. A classifier can be trained to distinguish a (foreground) object from the background. This is done by constructing a feature vector for every pixel in a reference image and training a classifier to separate pixels that belong to the object from pixels that belong to the background. One obvious drawback of local search methods is that they tend to get stuck into a local optimum.
Image Registration
Image registration establishes a common frame of reference for a sequence of images acquired by a camera of a scene taken over time, and from different views, or by different cameras. Image registration has a vital role in many computer vision applications, such as video tracking, medical imaging, remote sensing, super-resolution and data fusion, B. Zitova and J. Flusser, “Image registration methods: A survey,” Image and Vision Computing, 21:977-1000, 2003.
In general, image registration methods can be classified into two categories: direct methods and feature-based methods. Direct methods use pixel-to-pixel matching, and minimize a measure of image similarity to find a parametric transformation between two images. Often, hierarchical approaches are adapted to improve convergence properties. Feature-based methods first extract distinctive features from each image. Then, the features are matched between the images to establish the correspondence, and to warp images according to parametric transformations estimated from those correspondences. Unlike direct methods, feature-based registration does not require initialization and can handle large motion and viewpoint changes between the images. However, finding distinctive features in the image that are invariant to illumination, scale and rotation is difficult.
Another method uses shift invariant features to register images, M. Brown and D. G. Lowe, “Recognising panoramas,” IEEE International Conference on Computer Vision, pages 1218-1225, 2003. That method is insensitive to the ordering, orientation, scale and illumination of the images and removes ‘outlier’ images, which do not have any overlapping area with the other images. Due to the different characteristics of imaging sensors, the relationship between the intensities of corresponding pixels in multi-modality images is usually complex and unknown.
Conventional intensity based feature extraction fails in the case of multi-modality images. Mutual information based registration works for multi-modality images, J. Pluim, J. Maintz, and M. Viergever, “Mutual information based registration of medical images: a survey,” IEEE Trans. on Medical Imaging, 8:986-1004, 2003.
Tracking Moving Objects with Moving Cameras
Tracking of independently moving objects in a sequence of images acquired by a moving camera is inherently more challenging and complex because the motion of the camera induces a motion in all pixels in the image sequence. One method models the scene in terms of a small group of motions, S. Ayer and H. S. Sawhney, “Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding,” International Conference on Computer Vision, pages 777-784, 1995.
Another method estimate the number of motion models automatically, Y. Weiss and E. H. Adelson, “A unified mixture framework for motion segmentation: incorporating spatial coherence and estimating the number of models,” Proc. IEEE Conf. on Comp. Vision and Patt. Recog., pages 321-326, 1996. They incorporate spatial constraints and given assumptions about the expected level of model failure. The tracking result highly depends on the quality of the registration, which is unreliable when the registration algorithm fails to achieve reasonable results.