Unmanned Aerial Vehicles (UAV) have great potential to be widely used in both research and commercial applications, which often requires a target object tracking, such as motion-based recognition for human identification, automated surveillance for detecting suspicious activities, and human-robot interaction for hands and face tracking, etc. The target object tracking may be defined as a problem of estimating the trajectory of the object in the image plane as it moves around in a scene. Meanwhile, a tracker is highly desired to assign a consistent label to the tracked object over time in a sequence of video frames, and provide the object's centric information depending on different tracking domains.
Two subtasks are often proposed under the target object tracking task: building a model of the interested target and predicating the target's information at the current frame based on the target's information in the previous frames. These two subtasks are repeatedly performed to keep updating the model of the interested target. However, various factors may cause the tracking task on the UAV installed with a single-lens camera to be very challenging, such as loss of information caused by the projection of 3D real world to 2D image frames, noise in images, partial and full object occlusions, real-time processing requirements, and abrupt changes of the scene caused by the UAV's movement, etc.
Conventional tracking techniques work with imposed constraints, and the corresponding algorithms mainly fall within two main domains: Tracking-by-Detection and Filtering-based Visual Object Tracking. However, these techniques have their own limitations under different environments. For example, robust tracking is a critical component for an advanced UAV to interact with the real dynamic word in a natural way, which brings additional challenges to the conventional tracking techniques.
The disclosed system and method are directed to solve one or more problems set forth above and other problems.