In the field of visual perception, many applications require separating a target object or image of interest from a background. In particular, motion video applications often require an object of interest to be tracked against a static or time-varying background.
The visual tracking problem can be formulated as continuous or discrete-time state estimation based on a “latent model.” In such a model, observations or observed data encode information from captured images, and unobserved states represent the actual locations or motion parameters of the target objects. The model infers the unobserved states from the observed data over time.
At each time step, a dynamic model predicts several possible locations (e.g., hypotheses) of the target at the next time step based on prior and current knowledge. The prior knowledge includes previous observations and estimated state transitions. As each new observation is received, an observation model estimates the target's actual position. The observation model determines the most likely location of the target object by validating the various dynamic model hypotheses. Thus, the overall performance of such a tracking algorithm is limited by the accuracy of the observation model.
One conventional approach builds static observation models before tracking begins. Such models assume that factors such as illumination, viewing angle, and shape deformation do not change significantly over time. To account for all possible variations in such factors, a large set of training examples is required. However, the appearance of an object varies significantly as such factors change. It is therefore daunting, if not impossible, to obtain a training set that accommodates all possible scenarios of a visually dynamic environment.
Another conventional approach combines multiple tracking algorithms that each track different features or parts of the target object. Each tracking algorithm includes a static observation model. Although each tracking algorithm may fail under certain circumstances, it is unlikely that all will fail simultaneously. This approach adaptively selects the tracking algorithms that are currently robust. Although this improves overall robustness, each static observation model must be trained, i.e., initialized, before tracking begins. This severely restricts the application domain and precludes application to previously unseen targets.
Thus, there is a need for improved observation accuracy to provide improved tracking accuracy, and to robustly accommodate appearance variation of target objects in real time, without the need for training.