Proliferation of high-powered computing systems, availability of high-quality and inexpensive video-capturing devices, and the increased need for automated video analysis, has led to immense advancements in the field of visual object tracking. Visual object tracking may be pertinent in various tasks, such as automated surveillance, motion-based object recognition, vehicle navigation, video indexing, human-computer interaction, and/or traffic monitoring.
Visual object tracking may utilize various visual object tracking algorithms (hereafter referred to as, “trackers”), to estimate a trajectory of a target object, as the target object moves in an image plane of a scene recorded by a video-capturing device, such as a video camera. A tracker may assign consistent labels to the target object that may be visible in different video frames that correspond to the captured scene. The tracker may be initialized with a template image of the target object in an initial video frame. The tracker may learn an appearance of the object, based on the template image. Based on the learned appearance, the tracker searches for the object in the subsequent video frames. The tracker may utilize multiple image-processing algorithms and/or computer-vision algorithms, which may be based on various parameters. Examples of such parameters may include feature representations, search strategies, motion models, number of free parameters, and/or the like.
In most realistic tracking situations, the initialization template does not contain enough information for the tracker to be able to recognize all possible appearances of the object of interest. As a result of this, the tracker may drift away from the correct location due to a sudden change in appearance of the tracked object. In such scenarios, the tracker may lose the target object as it may not be able to adapt quickly enough to be able to handle the variations in the appearance over a period of time of the target object. In certain other scenarios, trackers that adapt quickly to the target object may demand huge computational resources and pre-learning of the various possible appearances of the target object throughout the captured scene.
In other scenarios, a combination of multiple trackers to track the target object may be utilized. However, the existent methods of combinations of multiple trackers may work only on specific types of tracking algorithms, such as Bayesian trackers. Further, human intervention may be required when a drift is detected in one or more trackers (of the multiple trackers) used to track the target object. Furthermore, existent methods require information, such as confidence maps, for each individual tracker to combine the output (of the multiple trackers) used to track the target object.
Furthermore, current methods to combine multiple trackers may use a passive-fusion method where the trackers do not interact with each other. Such a passive method may be based on a consensus of the multiple trackers, and/or other sampling strategies, to reconcile the output of the multiple trackers. Also, some trackers (of the multiple trackers) may be specific to handle certain scenes while other trackers may not handle such scenes. Therefore, a flexible and generic method, which may be applied to at least two combined trackers with complementary properties, may be desired so that combined trackers may result in an overall improved quality of tracked output of the video stream.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of described systems with some aspects of the present disclosure, as set forth in the remainder of the present application and with reference to the drawings.