One approach for video object tracking is to utilize an extraction process to extract object locations from a video frame and an object tracking process to associate those locations with each other over several video frames, and thus over time.
The extraction process can introduce errors, as the measurements of object locations and object characteristics may be inaccurate. For example, several locations may be extracted for a single real-world object, or the detected width of an object may be smaller than the actual width of the object. The errors introduced in extraction depend on the algorithm used and the complexity of the scene shown in the video frame that is being processed. The errors include, but are not limited to: detection failure, partial detection failure, multiple detections in place of one detection, one detection in place of multiple detections, over-detection, and entirely false detections. These errors can occur contemporaneously within a single frame of an image sequence.
The extraction process may additionally produce errors where the correct measurements are unavailable in the data to be extracted. This can happen, for example, where a real-world object is placed against a background of a similar brightness and hue, where the real-world object is otherwise not fully visible, or where the real-world object is overlapping with or near another active object.
Even correct object-location data can be difficult and complex. A difficult case for the tracking task occurs when the object whose visual location is to be extracted is partially or fully occluded.
A particular problem caused by errors introduced in the tracking process is the case where multiple detections are made in place of a single detection. When multiple detections occur erroneously, a tracker may fail to continue the original track, and/or create new tracks for the multiple detections inappropriately. One approach to this problem is to treat all detections within a certain distance of each other as being the same object. A disadvantage of this approach is that it frequently leads to the merging of objects which are coincidentally close, but which should not be merged. This over-merging disadvantage causes additional detection failures as a result. Also, over-merging can create objects which are not recognizably part of any track, which again leads to the inappropriate creation of new tracks.
The tracking stage of the processing creates tracks. Tracks usually have a stochastic basis, e.g., a Kalman Filter or an Alpha Beta Filter. The Kalman Filter equations or Alpha Beta Filter equations can be used to produce an expected spatial representation, also known as an expectation. An expected spatial representation is a predicted future location of a tracked object to within predetermined measurement noise limits for that particular application. Another known tracking method is to expect that a future measurement should be near the most recent measurement.
In some tracking approaches, the expectation is expanded to allow for error, and all detections smaller than the expectation and falling within the expanded area are treated as partially-detected components of the track being estimated. These approaches, however, are limited to correcting detections which are smaller than the expectation. For example, the approach fails when an object moves towards the camera and appears to become larger. Where other heuristics are used, the problem remains that trade-offs must be made, for example, computational complexity versus an optimal solution.
Another approach is to segment the object into clusters, and look at the average motion of each cluster. If the motion vector for a block in a detection is similar to the motion for a nearby cluster, then the block may be considered part of the cluster, and thus part of the object. However, this method is computationally expensive and requires that motion vectors for individual blocks be calculated along with the motion of the object and the clusters that constitute the object.
Thus, a need exists to provide an improved method and system for tracking objects in an image sequence.