1. Field of the Invention
This invention relates to object tracking in video.
2. Related Art
Web video services, such as the YouTube™ service provided by Google Inc. of Mountain View, Calif., have greatly increased the amount of available digital video. It is often desirable to track an object, such as a human face, across a sequence of frames in a video. However, object tracking can be challenging due to occlusions and variations in an illumination, position and appearance of the object.
Once an object is tracked in the video, an object recognition algorithm may be used to identify the object. In an example, a face recognition algorithm can use the position of the face in each frame to determine the face's identity. Numerous approaches to face tracking and recognition have been proposed.
One approach to object tracking, called Eigentracking, is described in Black et al., “Eigentracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation,” 1996, ECCV. Eigentracking uses a predefined model of an object, such as a face, being tracked. The model encompasses a range of variations of the object being tracked. For example, when a face is being tracked, the model may be trained with different images of the face. This approach has two main setbacks. First, the model may not encompass all the possible variations of the object, e.g. the model may not include all the possible ways the face may be displayed in the video. Second, Eigentracking often fails when the object being tracked is occluded as those variations are not included.
In contrast to Eigentracking, incremental visual tracking (IVT) can track an object, such as a face, without a predefined model. IVT is described in Ross et al., “Incremental Learning for Robust Visual Tracking,” 2007, IJCV. NT starts with an initial location of an object, such as a face, and builds its model as the object is tracked across more frames. While IVT avoids Eigentracking's problem of an incomplete predefined model, IVT also suffers from a setback. As IVT tracks an object, alignment errors may arise. The alignment errors may compound as more frames are processed. As alignment errors compound, IVT may drift from the tracked object.
Once a face is tracked, the position of the face in each frame can be used by a face recognition algorithm to determine an identity. One approach to face recognition is described in Lui and Chen, “Video-based Face Recognition Using Adaptive Hidden Markov Models”, 2001, CVPR. While this approach has advantages, it may have accuracy problems.
Systems and methods are needed that accurately track and recognize faces in video.