Visual object tracking refers to the task of estimating the location, and optionally also the scale, of an object of interest in image data, typically in a video. Frequently, the location of the object is specified by a user in a first frame of the image data, for example by means of a bounding rectangle. Visual object tracking is a key component of numerous applications of video processing such as surveillance, robotics, man-machine interaction, post-production and video editing.
Traditional approaches to object tracking rely on matching an appearance model of the object of interest from frame to frame. Various choices of appearance models and associated matching schemes have been proposed in literature, including color histograms, feature points, patch-based features, or the image contents of the bounding box around the object of interest. Recently, discriminative approaches known as “tracking by detection” have been proposed, which compute and adaptively update a classifier in order to optimally discriminate the object of interest from its near background. The image patch in a frame that yields the highest “object” classification score provides the object location estimate for this frame. In more detail, this approach is for example described in [1]. As a variant, [2] proposes to learn online a compatibility function between the appearance of the object and the deformation induced by its motion. Maximizing this compatibility function yields the sought object motion from one frame to the next. The object motion, which is estimated from one frame to the next, forms the “state” of the tracker. Often, the object is assumed to follow a 2D translation, and the state is made up of the horizontal and vertical components of the corresponding 2D translation vector. More complex transformations of the appearance of the object may be considered, including changes of apparent size and shape. In this case, the state vector is enriched with more variables that need to be estimated from frame to frame.
Visual object tracking must cope with changes of appearance of the object over time. These are primarily caused by variations of the object pose, camera viewpoint and lighting conditions. These changes call for an online adaptation of the object appearance model, based on the current estimate of its position.
A comprehensive survey and evaluation of visual tracking methods has been compiled in [3].