Real-time tracking of moving objects has many practical applications, including use in surveillance and monitoring, air traffic control, sporting events, human-computer interaction, smart rooms, and video compression. However, current tracking algorithms continue to have difficulty with efficient and robust real-time tracking of objects in complex environments. Challenges in real-world target tracking include tracking single and multiple targets in complex environments with multiple objects and clutter; tracking agile targets with unpredictable directions and speeds; and environmental influences such as illumination and view changes, occlusions, low image quality, and motion blur.
Current observation models for visual tracking can be separated into two classes; tracking likelihood models (TLMs) or verification likelihood models (VLMs). TLMs generally classify objects using simple image features, such as contours, color histograms, or image templates. As a result, TLMs are simple and efficient, but cannot handle complex changes in the appearance of the target. VLMs tend to use classifiers that differentiate the true target from false positives, and therefore need to extract and store features such as invariants of the target or variations of the target's appearance. VLMs are computationally demanding and difficult to model, but capable of more accurately recognizing the target. In addition, supervised learning is often required to adapt a VLM to the variabilities of a particular target.
Techniques used for target tracking include artificial neural networks, Bayesian methods, and mean-shift tracking. Artificial neural networks are interconnected groups of artificial neurons; the connections and weights between neurons in an artificial neural network determine the outputs, given a set of inputs. Artificial neural networks can be trained to identify the features of a target and track it in a sequence of images.
Bayesian methods use evidence or observations to update or newly infer the probability that a hypothesis is true. Hypotheses with a high degree of belief, or probability, are accepted as true, while hypotheses with a low degree of belief are rejected as false. Bayesian methods can be used to identify a target by extracting information about the surroundings as well as properties of the target in previous frames.
Mean-shift tracking involves minimizing the statistical distance between two distributions. The target is initially characterized with a probability distribution related to an attribute, such as color, texture, or image gradient. In subsequent frames, the target is tracked by minimizing the statistical distance between the characterized probability distribution and the distribution found in the current frame. In a mean-shift iteration, the center of the target is translated by the mean shift vector, which is an estimate of the normalized density gradient. The statistical distance is computed after the translation, and a new mean shift vector is applied, until the statistical distance is minimized or the centers of the distributions are separated by less than a minimum physical distance.