1. Technical Field
The invention is related to a system for tracking objects, and in particular, to a system and method for real-time probabilistic mode-based multi-hypothesis tracking using parametric causal contour models.
2. Related Art
Accurate tracking of the objects, such as, for example, the human head and face is an important application of object tracking. For example, the ability to track moving people in video surveillance and video conferencing systems greatly increases the utility of such systems. Unfortunately, robust and efficient tracking of human heads and faces in complex environments is a problem which has not been adequately addressed by existing tracking schemes.
In general, the basic objective of most conventional tracking schemes is to accurately and efficiently compute a posterior probability of a tracking state for a target object or objects with respect to an image observation. With respect to heads and faces, the tracking state typically represents information such as, for example, location and orientation of the head or face. Given this basic objective, there are three general approaches to estimating a probability distribution, i.e., pure parametric, pure non-parametric and semi-parametric.
The well-known Kalman filter is a good example of the pure parametric approach, where the distribution is assumed to be Gaussian. Unfortunately, because of its uni-mode assumption, the use of Kalman filters has only achieved limited success in real-world tracking applications. To overcome this difficulty, one conventional scheme uses a non-parametric approach wherein the object tracking probability distribution is represented and estimated by a set of properly positioned and weighted “particles.” The scheme works with both multi-mode distributions and non-linear dynamic systems. However, as with most if not all non-parametric algorithms, this scheme requires a large number of particles. Further, the required number of particles grows exponentially with the dimensionality of the state space. Unfortunately, as the number of particles increases, so does the computational complexity and cost of solving the tracking problem.
Several other conventional schemes have attempted to address the problem of needing large numbers of particles for tracking by simply making the particles more effective. For example, one such scheme uses an annealed particle filter for tracking an articulated human figure. This scheme is based on probabilistic pruning, and focuses its particles in a neighborhood around global peaks of the weighting function. While this scheme greatly reduces the number of particles needed, it achieves this result at the cost of sacrificing robustness in a Bayesian framework. In particular, by discarding inferior peaks in the weighting function, this scheme can lose the true state of the object being tracked when large distractions or discontinuities occur in the observation data.
Several other conventional schemes have attempted to address the problem of needing large numbers of particles for tracking by using a semi-parametric approach where the probability distribution to be estimated is modeled by a mixture of parametric distributions. These semi-parametric approaches retain the capability of representing multi-mode distributions, but with much fewer samples or particles. In particular, one of the most successful semi-parametric schemes used in object tracking is known as multi-hypothesis tracking (MHT).
MHT was first developed in radar-tracking systems. However, one conventional scheme has successfully applied MHT in articulated human body tracking. MHT works in a parametric state space. Each hypothesis is a particular configuration of parameters in the state space, and the overall state is represented by a mixture of multiple hypotheses. One limitation with the classic MHT, as used in radar tracking, is that it assumes that a set of discrete hypotheses is available at any time step. This assumption is valid in radar tracking where the goal is to associate multiple detected targets with multiple airplanes, missiles, spacecraft, etc. However, in visual tracking, this assumption cannot easily be met. For example, for human head tracking, it would be extremely difficult to develop a single high-level “feature detector” that can detect a set of discrete hypotheses of the head position/pose at every frame. On the other hand, using low-level features such as image edges in this scheme quickly leads to an intractable number of hypotheses.
Another conventional scheme addresses this particular difficulty by first using an appearance-based gradient local search to generate a set of hypotheses (local maximums), and then constructing a likelihood function as a piecewise Gaussian by combining the multiple hypotheses. While this approach has successfully demonstrated the effectiveness of the MHT paradigm in visual tracking, it has three major difficulties. First, for visual tracking, the appearance or template-based approaches only work with relatively rigid objects and with objects that rarely change orientation and intensity. For head tracking, however, the head orientation and environmental lighting can change from frame to frame, causing head appearance change dramatically. Second, this scheme uses an iterative Gauss-Newton method to generate hypotheses, which is both computationally expensive and unsuitable for real-time tracking. Finally, and most importantly, while this scheme produces maximum likelihood estimates, it does not compute the posterior probability of the tracking state with respect to the image observation. As a result, the tracking performance of this scheme can be significantly degraded.
Therefore, what is needed is a system and method for tracking objects such as heads and faces that is both robust in complex environments and computationally efficient. Further, this system and method should be capable of tracking objects wherein the appearance is capable of changing from one image frame to the next. In addition, this system and method should be capable of using multi-hypothesis tracking while also computing a posterior probability of the tracking state with respect to image observations.