The presently disclosed subject matter relates to techniques for tracking objects in video.
The rise of unstructured multimedia content on sites such as Youtube has fostered interest techniques to recognize objects and activities in less structured, unconstrained and more realistic domains. Along the same lines, there is interest in automated video surveillance systems that can detect categories of activity in a field of video of a surveillance camera.
Conventional tracking systems typically first detect the location of an object in a video and then subsequently track the location of the object. Certain methods to detect an object include either having a human user first click on the object or using an algorithm to detect an object using a variety of techniques. Once the initial location information is determined, it can be passed to the tracker along with information (e.g., color, size, shape, etc), which then tracks the location of the object throughout successive frames.
However, in certain object tracking systems, lack of structure and to the low quality of the data, particularly in conventional appearance-based trackers, can result in drift, i.e., increasing error as to the trackers prediction of where the object is located as time goes on. Additionally, particularly in the area of surveillance, tracking performance can be important.
Accordingly, there is a need for techniques that enhance tracking performance in tracking objects in a video input.