Visual tracking of specified objects (i.e., target objects) is an area of computer vision that has many useful applications. For example, visual tracking may be used in video surveillance, human-computer interfaces, digital video editing, and the like. In general, visual tracking tracks a target object during a given observation, such as a video sequence. Unfortunately, visual tracking techniques have difficulty tracking target objects in several situations.
For example, one situation where visual tracking techniques have difficulty tracking the target object occurs when the target object experiences sudden motion, such as from an unexpected dynamic change of the target object itself or from an abrupt motion of the camera. Another situation where visual tracking techniques have difficulty occurs when a similar looking object is in close proximity to the target object. In this situation, tracking techniques have difficulty distinguishing which of the two objects is the target object. Visual tracking is also difficult when occlusion occurs and the target object is partially or completely hidden by another object.
In overview, most tracking techniques use recursive estimation to estimate a location of a target object at a current time t based on observations up to the time t. In a Bayesian framework, the tracking problem is commonly formulated as a recursive estimation of a time-evolving posterior distribution P(xt|y1:t) of state xt given all the observations y1:t, such that:
                                          P            ⁡                          (                                                x                                      t                    +                    1                                                  ⁢                                  ❘                                ⁢                                  y                                                            1                      ⁢                                              :                                            ⁢                      t                                        +                    1                                                              )                                ~                      P            ⁡                          (                                                y                                      t                    +                    1                                                  ⁢                                  ❘                                ⁢                                  x                                      t                    +                    1                                                              )                                      ⁢                              ∫                          ⅆ                                                          ⁢                              x                t                                                                                    ⁢                                    P              ⁡                              (                                                      x                                          t                      +                      1                                                        ⁢                                      ❘                                    ⁢                                      x                    t                                                  )                                      ⁢                                                  ⁢                                          P                ⁡                                  (                                                            x                      t                                        ⁢                                          ❘                                        ⁢                                          y                                              1                        ⁢                                                  :                                                ⁢                        t                                                                              )                                            .                                                          (        1        )            Recursive estimation has two major advantages: 1) efficient computation; and 2) natural fit with real-time or on-line tracking applications.
However, many real world applications fit in the category of offline tracking, such as event statistics in video surveillance, object-based video compression, home video editing, video annotation, visual motion capture, and the like. Therefore, the recursive approach has also been applied to offline visual tracking. When this is done, the long input video sequence is typically first decomposed into short sequences by specifying one or more keyframes. The specified keyframes can be any of the frames within the video sequence. Each keyframe contains an object template which designates the object to be tracked (i.e., the target object). Visual tracking using these decomposed short sequences is commonly referred to as keyframe-based tracking. The recursive approach is then applied to each of the short sequences in either the forward or backward direction. This approach, however, typically fails somewhere in the middle of the sequence. When this occurs, another keyframe is added at the failed location.
While adding new keyframes improves the outcome of the visual tracking, adding new keyframes in a trial-and-error manner is prohibitively time consuming. Thus, there is a continual need to improve upon the tracking techniques used in offline applications.