Existing computer systems are capable of performing object detection in a video image, i.e., identifying an object in an image (e.g., a frame of a video sequence). These systems are also capable of tracking such images including, for example, tracking the movement of a human face (i.e., the object or feature) in a sequence of frames of video that a computer system camera is recording or has recorded.
In video editing, object tracking refers to interactive motion analysis in which a user-specified feature (i.e., an image patch) is tracked over time and the estimated motion (e.g., affine or projective) is then applied to some target image or visual effect so that it “match moves” to the tracked feature. Match moving is considered a cinematic technique that allows the insertion of computer graphics into live-action footage with correct position, scale, orientation, and motion relative to the objects in the frames of the video sequence.
Conventionally, there have been many motion analysis techniques proposed, such as template matching according to the Lucus-Kanade method (identified below). Of such motion analysis techniques, template-based (or pixel-based) tracking algorithms are the most widely used for tracking objects in frames of a video sequence. However, since the computational complexity of the template tracker is roughly proportional to the area of the feature, the template tracker will slow significantly as it tracks large features, i.e., larger object with more pixels.
On the other hand, corner tracking methods estimate motion from sparse point correspondences so that the complexity of these algorithms can be capped by the maximum number of corners to track N_max (e.g., 256). However, corner tracking method fails to reliably track small or textureless features where it cannot detect a sufficient number of corners.
Hybrid tracking systems have been developed that use both corner based and template based tracking algorithms to improve accuracy and stability of the tracked object in the video sequence. For example, one existing hybrid tracking design is disclosed in “Hybrid Feature and Template Based Tracking for Augmented Reality Application” to Kusuma et al.; Asian Conference on Computer Vision (ACCV) (2014) (hereinafter “Kusuma”). In Kusuma, a hybrid tracking algorithm is disclosed for performing corner tracking followed by template tracking to maximize tracking accuracy and stability. However, Kusuma's system consumes significant computing resources by performing two separate tracking processes in sequence. Moreover, in “A Real-Time Tracking System Combining Templated-Based and Feature-based Approaches” to Ladikos et al., International Conference on Computer Vision Theory and Applications (2007) (hereinafter “Ladikos”), a hybrid tracking design is proposed that switches between the tracking algorithms, but only makes the decision to switch after having performed one of the tracking algorithms identified by some means. Each of the designs in Kusuma and Ladikos are overly complicated and consume significant computing resources, leading to slow tracking speeds.
In particular, to implement the tracking processes, these hybrid tracking systems require a finite state machine to identify the current tracking state as well as some appropriate switching algorithm to switch between the tracking algorithms. For example, the tracking system in Ladikos uses the current state of the finite state machine to check if its base tracker (i.e., template based tracking) is accurately tracking an object based on complicated criteria. If the ongoing template tracking is not accurately tracking an object, Ladikos's algorithm switches to corner based tracking. Although it is possible that Ladikos's algorithm may perform fast tracking if the finite state machine is very accurate, providing such a finite state machine will not only complicate the design of a hybrid tracking system, but also consume computing resources. Moreover, even with an accurate finite state machine, the tracking speeds of Ladikos's algorithm will still slow down significantly when tracking a complicate video sequence containing large motion, for example, where the tracker switching must occur frequently between template tracking to corner tracking and back to template tracking and the like. These types of tracking scenarios, which are not uncommon, increase the switching overhead and slows down the overall tracking.
Accordingly, what is needed is a tracking system and method for match moving that increases tracking speed while minimizing consumption of computing resources.