Tracking is the process of estimating a motion of an object in a sequence of images. Object tracking methods generally require that first the object is detected in some initial image. Then, the object can be tracked in subsequent images. The variety of object detecting methods is too large to enumerate. Tracking methods can be classified as state-space estimator methods or model alignment methods.
State-Space Estimator Method
State-space estimator methods typically use a Markovian process and construct a probability density function (pdf) of motion parameters. For example, Kalman filtering uses a normal distribution. However, the Kalman filtering method fails to describe multi-modal distributions.
Monte Carlo integration methods, e.g., particle filters, can track any parametric variation including object pose. However, the methods dependency on random sampling tends to degenerate estimated likelihoods, especially for higher dimensional representations. Moreover, the methods computational requirements grow exponentially by the number of state variables, which makes the methods unsuitable for tracking complex pose changes.
Model Alignment Method
Model alignment methods define a cost function based on a difference between an object model and an object as seen in an image. The cost function is solved by minimizing motion parameters. One example is optical flow estimation, where a sum of squared differences between the object model and the image intensities are minimized as an iterative least squares problem. A major difficulty of the method is that the method requires computation of the image gradients, the Jacobian and the Hessian matrices, for each iterations, which makes the method slow.
Other model alignment methods overcome the difficulty by alternative formulations of the motion and the cost function relation. In some methods, the motion is estimated using a linear function of the image gradient, which is learned in an off-line process. That idea is extended to learn a non-linear mapping from images to the motions using relevance vector machine.
But, those methods estimate the additive updates to the motion parameters via linearization. Thus, those methods cannot track non-linear motions.
Lie Group Theory for Motion Estimation
Lie algebra can be used to find modes of a distribution having Euclidean motion group structure, for rigid motion estimation using a mean shift operation. It is known that the mean shift can fail when the motion is large. A vector addition operation is defined on the Lie algebra to integrate series of affine motions for tracking an affine ‘snake’.
Additive updates are performed on the Lie algebra for template tracking. However, that approach fails to account for the non-commutativity of the matrix multiplications and the estimations are only valid near the initial transformation of the object.
It is desired to track an object in a sequence of images while the object moves non-linearly. It is also desired to detect the object in an initial image. Furthermore, it would be advantageous if the methodology that underlies the detecting and tracking could be the same.