Tracking is the process of estimating a motion of an object in a sequence of images. Method for tracking objects generally require that the object is detected in some initial image. Then, the object can be tracked in subsequent images. Tracking methods can generally be classified as state-space estimators, model alignment, and localized kernel searches methods.
State-Space Estimator Method
State-space estimator methods typically use a Markovian process, and construct a probability density function (pdf) of motion parameters. For example, Kalman filtering uses a normal distribution. However, the Kalman filtering method fails to describe multi-modal distributions.
Monte Carlo integration methods, e.g., particle filters, can track any parametric variation including a pose of the object. However, those methods dependend on random sampling and tend to degenerate estimated likelihoods, especially for higher dimensional representations. Moreover, the computational requirements of those method grow exponentially as a the number of state variables, which makes those methods unsuitable for tracking complex pose changes.
Model Alignment Method
Model alignment methods define a cost function based on a difference between an object model and an object as seen in an image. The cost function is solved by minimizing motion parameters. One example is optical flow estimation, where a sum of squared differences between the object model and the image intensities are minimized as an iterative least squares problem. A major difficulty of that method is the computation of the image gradients, the Jacobian and the Hessian matrices, for each iterations, which makes that method slow.
Other model alignment methods overcome the difficulty by alternative formulations of the motion and the cost function relation. In some methods, the motion is estimated using a linear function of the image gradient, which is learned in an off-line process. That idea is extended to learn a non-linear mapping from images to the motions using relevance vector machine.
But, those methods estimate the additive updates to the motion parameters via linearization. Thus, those methods cannot track non-linear motions.
Localized Kernel Searches
In contrast, kernel based methods represent an object as an image region, and search for the same region using the previous location as a prior probability. That search is imposed as an exhaustive matching process or as an iterative density gradient estimation. Kernel methods often require an object to have overlapping areas between consecutive frames. Due to the primitive object representations, e.g. histograms and templates, the kernel based methods cannot discriminate pose variations, and are confined to the translational motion.
Lie Group Theory for Motion Estimation
Lie algebra can be used to find modes of a distribution having Euclidean motion group structure, for rigid motion estimation using a mean shift operation. It is known that the mean shift operation can fail when the motion is large. A vector addition operation is defined on the Lie algebra to integrate series of affine motions for tracking an affine ‘snake’.
Additive updates are performed on the Lie algebra for template tracking. However, that approach fails to account for the non-commutativity of the matrix multiplications, and the estimations are only valid near the initial transformation of the object.
One tracking method based on Lie algebra minimizes a first order approximation to a geodesic error and reports very satisfactory pose tracking results especially when the object motion is not large, see U.S. patent application Ser. No. 11/862,554 filed by Porikli et al. for “Method and System for Detecting and Tracking Objects in Images,” and incorporated herein by reference.
It is desired to track an object in a sequence of images using particle filters even for complex pose changes.