Object tracking, and face tracking in particular, have been widely investigated problems, in part due to their potential for use in many real world applications. At the same time, tracking faces in unconstrained environments is a challenging task due to real-time performance requirements and robustness with regards to changes in the object's appearance. A variety of different tracking approaches have been tried that can coarsely be divided in three categories: point tracking; silhouette tracking; and template based tracking.
In point tracking, the object is represented as a collection of points. Mechanisms, such as position and motion estimation, are used for the prediction of the point's location in the next frame. Examples of point tracking algorithms are the Kalman filter approach and the particle filter approach.
In silhouette tracking algorithms, the goal is to model the object and use the model in order to locate the target in the current frame. This can be achieved using either a shape matching algorithm or a contour tracking approach.
In template-based tracking, the methods use a template representing the object and try to estimate the position of the object in the current frame. More specifically, in template based tracking a region containing the object is selected in the first frame either manually or automatically and appropriate features are extracted. In subsequent frames, every image is searched in order to identify the location that maximizes a similarity score between the template comprising the extracted features and the particular image region. The key issues related to template based tracking are the types of features that are extracted and the similarity score that is employed.
One of the most popular features used in object tracking is color. The object is represented using its appearance in some colorspace such as the RGB, the HSV and the L*a*b*. One prior approach proposed a tracking algorithm that considered color histograms, as features, that were tracked using the mean-shift algorithm. Despite its success, this algorithm unfortunately exhibits high sensitivity to illumination changes that may cause the tracker to fail.
Another type of feature that has been used for tracking is edges. Edges are less sensitive to illumination changes compared to color. Nevertheless, no generic edge detection method performs well in every scenario, and in most cases the extracted edges are application specific.
Optical flow is another popular feature used in tracking. Optical flow is a set of vectors indicating the translation of each pixel in a region. When computing optical flow, there is an assumption that corresponding pixels in consecutive frames have the same brightness, called the “brightness constancy” assumption. Tracking algorithms using optical flow are usually based on the Lucas-Kanade method.
One of the more recent feature extraction methods is based on the extraction of interest points. Scale Invariant Feature Transform (SIFT), proposed uses differences of Gaussian functions in scale space in order to identify the interest point and their location, orientation and scale in order to describe them. SIFT features combined with the color histogram mean shift were combined for object tracking.
Recently, there has been an increased interest in applying online learning for feature selection in the context of tracking. With this tracking, the region around the object in the previous frame was used as a positive example and the background regions surround it as negative examples for training a boosting classifier. The algorithm does not require a priori knowledge of the object's model and is able to adapt to variations in its appearance. Nevertheless, it introduces additional computational complexity due to training and is susceptible to drift and occlusion since no explicit model of the object is maintained.
In addition to the type of features that are extracted, the metric used for comparing candidate regions is very important. Some of the widely used distance metrics for template based tracking include cross-correlation, the Bhattacharya coefficient, the Kullback-Leibler divergence and sum-of-square differences.
Off-line trained models have also been suggested for tracking which use a view-based eigenbasis representation and a robust error norm. In one off-line trained model a probabilistic approach for tracking contours was used. A major bottleneck of these methods is that once the model is created, it is not updated and as a result tracking may fail due to changes in illumination not accounted for during training. In addition, they require training with all possible poses which is time consuming.
The assumption that the appearance of the object remains the same throughout the tracking is unrealistic in real-world videos. Recently, various algorithms have been presented that update the object's template in order to match the current appearance.
Other prior approaches have suggested modeling the object using three components: the wandering; the stable; and the lost. Although their approach is capable of handling variations in the object's appearance, no drift correction is applied and the tracking may fail due to slow adaptation to non-object. In addition the computational cost of maintaining and updating the model can prevent real-time performance.
In another prior approach, a template update technique was proposed in conjunction with the Lukas-Kanade tracker. Spatiotemporal motion regions were used for constraining the search region and the alignment space while a cost function was employed in the case of occlusion.
In yet another approach, an online learning method for template update was presented where the object is represented in a low-dimensional space using principal components analysis (PCA). Although this proposed method succeeds in utilizing an updated object model, it is vulnerable to drift because no global constraints are used to confine the subspace representation of the model.
A known problem of many prior adaptive template based tracking algorithms is drift. Drift occurs when small misalignments cause the tracker to gradually adapt to non-target background and fail.
A template update method was proposed to correct for the drift. With this method, a template was generated as a linear combination of the initial template (obtained in the first frame) and the aligned object's appearance in the current frame. As a result, both the initial model and the current appearance were used for the template. Even though the algorithm was robust against drift, it tolerates little deviation in the object's appearance. Considering the case of face tracking, a linear combination of a frontal face and a 45 degrees profile face will not reliably represent the face in every frame.
The Eigentracking approach has been extended by incrementally updating the eigenbasis of the object's appearance. Robustness against drift was later introduced by applying visual constraints of the object's appearance in the context of particle filter tracking. Two types of constraints where introduced, generative for pose and discriminative for alignment. For the pose constraints, the authors constructed a set of pose subspaces and utilized the distance among the subspace as a constraint in the possible pose. For alignment, a support vector classifier was employed. Although this approach tackles the problem of drift, it relies on training and thus limits its applicability to already learned cases.
Another proposal was an online selection of features for increased robustness in tracking. Feature selection was based on the discrimination of the object and the background and they were adaptively updated through tracking. The problem of drift was tackled by pooling pixel samples from previous tracked frames while keeping the model consistent with the initial appearance. The algorithm can adapt to small variation in object's appearance, but it fails when the initial appearance is no longer consistent with the current one such as large pose variations.
A template update mechanism also has been suggested. A comparison was made between the error that would occur if the template was updated and the error if the template was not updated. The comparison was based on a criterion that considered the estimated covariance matrix of the template matching error. Similarly to other methods, this method assumes that the appearance of the objection will not change significantly during tracking.
Another proposal was an SMAT object tracking algorithm that combined modeling of the object's appearance with tracking. SMAT uses a set of templates (exemplars) sampled from previous frames and the mutual information metric in order to select the closest matching feature templates. In addition, it continually updates a shape model of the features. A major benefit of SMAT is that it becomes more robust as tracking progresses. Although the SMAT algorithm correctly updates the templates, no discussion is made with regards to the robustness of the method in the case of occlusion when the features that are tracked are no longer visible.
In another approach, the use of local generative model to constrain the feature selection process in online feature selection tracking algorithms was introduced. Non-negative matrix factorization was employed in order to identify basis functions that describe the object appearance. The method was able to identify occlusions and select only appropriate features. However, it is based on an off-line training stage that may limit its flexibility.