1. Field of the Invention
The present invention relates to image processing. More particularly, the present invention relates to a method and apparatus for tracking an object using plurality of images that are spaced apart by an interval of time (“temporally-spaced images”).
2. Description of Related Art
Tracking a movable object from an aerial or other type of platform, generally requires data association over long intervals of time. However, when a movable object moves the movable object may not remain continuously in a field of view while being tracked. The movable objects can leave the field of view for a number of reasons, including occlusions and inaccuracies in platform pointing directions. When another movable object appears in the field of view, the tracking of the movable object enters a realm of uncertainty because of the uncertainty of whether the latter-observed movable object is the same the formerly-observed movable object.
Visual object recognition of the movable objects is an important component of tracking a movable object, and prior research and development has provided a number of standard mechanisms for performing the visual object recognition. However, these standard mechanisms, such as frame-to-frame data association for multiple images, are not usable when the multiple images are separated by an interval of time (e.g., the multiple images are not contiguous).
Despite prior research and development regarding such object recognition, real-time, near-real time and/or other contemporaneous object recognition or “fingerprinting” remains a challenging problem. To begin with, only a limited amount information for carrying out training can be acquired from a first (or first set) of a plurality of the temporally-spaced images (“first learning image”). That is, only a limited amount of data can be garnered from and about the first learning image of a given moving object to develop a learning sequence for recognizing such given object.
Second, the prior research and development lacks mechanisms for providing reliable and invariant representations of features of the given object (“object features”) that are captured in a the plurality of temporally-spaced images. For instance, the prior research and development lacks mechanisms for providing reliable and invariant object features to overcome drastic pose changes, aspect changes, appearance changes and various occlusions of the given object between the learning sequence and a query sequence. Consequently, the prior research and development lacks trustworthy mechanisms for correlating among one or more of the object features captured in the plurality of temporally-spaced images.
Third, the prior research and development lacks mechanisms to manage differences in the temporally-spaced images that result from capturing the temporally-spaced images using one or more differing platforms and/or resolutions. Fourth, the prior research and development lacks mechanisms to accurately segment, mask or otherwise discern the given object from the background. Additionally, the prior research and development lacks mechanisms to differentiate between the given object and one or more other objects that are similar to, but not the same as, the given object.
Therefore, there is a need in the art for a method and apparatus for tracking and/or monitoring an object using a plurality of temporally-spaced images.