A digital camera is a component often included in commercial electronic media device platforms. Digital cameras are now available in wearable form factors (e.g., video capture earpieces, video capture headsets, video capture eyeglasses, etc.), as well as embedded within smartphones, tablet computers, and notebook computers, etc. The introduction of streaming video from mobile digital cameras has ushered in an era with unprecedented volumes of video data.
The video stream generated by any camera will include various objects moving in and out of the camera's field of view. Visual object tracking is a process of locating an arbitrary object of interest over time in a sequence of images captured from a camera. Adaptive tracking-by-detection methods are widely used in computer vision for tracking arbitrary objects. The definition of an “object” can vary from a single instance to a whole class of objects. One objective of tracking is to associate objects in consecutive images, based on the detection or tracking of previous image frames. Real-time visual object tracking entails processing the video data stream at the camera frame-rate to determine automatically a bounding box of a given object, or determine that the object is not visible, in each frame.
Challenges of object tracking include background clutter and dealing with changes in an object's appearance that may make the object's appearance in an initial frame irrelevant. Changes in scale, partial occlusion, changes in shape, and illumination are all events, which may change an object's appearance over a number of consecutive frames.
There has been considerable research on fast and automated methods for object tracking. One tracking framework, referred to as tracking-learning-detection (TLD) decomposes the tracking task into three sub-tasks of tracking, learning, and detection, which can operate concurrently. The sub-task of online learning has proven particularly challenging. Online learning entails updating target models during run-time in an effort to make an object tracker robust to the changes in object shape, view, and illumination. It is difficult to update and manage the models in real time where frequent tracking misses may occur, particularly for low power mobile device hardware resources (e.g., processors and memory). For example, learning tasks that employ support vector machines (SVMs) often require complicated data structures and rely on regression approaches having a high computational cost/complexity necessitating powerful hardware to process high resolution images (e.g., full HD) in real-time (e.g., at 30+ frames per second). Since video capture by mobile devices is so popular, a powerful CPU and/or GPU is not always available. Hence, many of the platforms responsible for generating the vast majority of a user's archival image data have been thus far ill equipped to perform sophisticated object tracking.
Automated visual object tracking that can be implemented by ultra light and low-power mobile platforms in real time with a video stream captured at potentially high frame rates (e.g., 30 frames/second, or more) is therefore highly advantageous.