In a sequence of frames or video, objects can be tracked by determining correspondences of object features from frame to frame. However, accurately tracking a deforming, non-rigid and fast moving object continues to be a problem.
Tracking can be performed with a mean shift operator, D. Comaniciu, V. Ramesh, and P. Meer, “Real-time tracking of non-rigid objects using mean shift,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition, vol. 1, pages 142-149, 2000. A nonparametric density gradient estimator is used to track an object that is most similar to a given color histogram. That method provides accurate localization. However, that method requires some overlap of the location of the object in consecutive frames, which will not be the case for fast moving objects.
Tracking can also be considered as an estimation of a state of an object, given all measurements up to a moment in time. This is equivalent to constructing a probability density function (pdf) of the object location. An optimal solution is provided by a recursive Bayesian filter, which solves the problem in successive prediction and update steps.
When the measurement noise is assumed to be Gaussian-distributed, one solution is provided by a Kalman filter, which is often used for tracking rigid objects, Y. Boykov and D. Huttenlocher, “Adaptive Bayesian recognition in tracking rigid objects,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition, vol. 2, pages 697-704, 2000; and R. Rosales and S. Sclarroff, “A framework for heading-guided recognition of human activity,” Computer Vision and Image Understanding, vol. 91, pages 335-367, 2003. The Kalman filter is confined to predefined state transition parameters that control a ‘viscosity’ of motion properties.
When the state space is discrete and consists of a finite number of states, Markovian filters can be applied for object tracking. The most general class of filters is represented by particle filters, which are based on Monte Carlo integration methods. A current density of a particular state is represented by a set of random samples with associated weights. A new density is then based on the weighted samples.
Particle filters can be used to recover conditional density propagation for visual tracking and verification. Generally, particle filtering is based on random sampling, which is a problematic issue due to sample degeneracy and impoverishment, especially for high dimensional problems. A kernel-based Bayesian filter can be used for sampling a state space more effectively. A multiple hypothesis filter evaluates a probability that a moving object gave rise to a certain measurement sequence.
As a problem, all of the above filter-based methods can easily get stuck in local optima. As another concern, most prior art methods lack a competent similarity criterion that expresses both statistical and spatial properties. Most prior art methods either depend only on color distributions, or structural models.
Many different representations, from aggregated statistics to appearance models, have been used for tracking objects. Histograms are popular because normalized histograms closely resemble a probability density function of the modeled data. However, histograms do not consider spatial arrangement of the feature values. For instance, randomly rearranging pixels in an observation window yields the same histogram. Moreover, constructing higher dimensional histograms with a small number of pixels is a major problem.
Appearance models map image features, such as shape and texture, onto a uniform-sized window of tensors. Because of the exponential complexity, only a relatively small number of features can be used. Thus, each feature must be highly discriminant. The reliability of the features strictly depends on the object type. Appearance models tend to be highly sensitive to scale variations, and are also pose dependent.
Therefore, it is desired to provide a better method for tacking objects in videos.