Visual tracking is a fundamental problem in computer vision, with many applications involved in various fields, such as motion analysis, human computer interaction, robot perception and traffic monitoring, etc. While much progress has been made in recent years, there still exist many challenges in designing a robust visual tracking algorithm, which are mainly caused by varying illumination, camera motion, occlusions, pose variation and rotation.
One of the issues in visual tracking is to match the target's visual appearances over consecutive video frames. Similar to many other computer vision problems such as image retrieval and object recognition, the matching process is largely stipulated by the coaction of the appearance model of the target that describes the target with unpredictable variations due to complex and dynamic scenes and the distance metric that determines the matching relations, e.g., the candidates with the target in visual tracking, or the whole image set with the query image in image retrieval domain.
Distinct from those retrieval or recognition tasks, a tracking problem holds a uniqueness that the appearance model and distance metric are both required to be adaptive to the changes of target as well as background over video frames.
The appearance model of the target plays a big role in visual tracking. In other words, if a strong appearance model can be constructed which is invariant to illumination changes and local deformation, discriminative from the background, and capable of handling occlusions, even simple and fixed distance metric like the Euclidean distance results in favorable matching performance. However, respectable difficulties arise to obtain a good appearance model which is able to handle the difficulties mentioned above. Although many excellent models are designed to describe the target in recent years, few algorithms can deal with various challenges at once. They usually only focus on solving certain tracking issues. When strong appearance model cannot be easily accessed, the choice of distance metrics for visual tracking becomes particularly critical. Many recent researches explored single distance metric learning in tracking problem and obtained pleasant results.
Recently, more and more tracking algorithms in collaborative framework achieve satisfactory performances. These approaches are all based on a simple motivation that single description of the object has its limitation in object representation and may lose helpful discriminative information for matching.