This invention relates to object tracking within a sequence of image frames, and more particularly to methods and apparatus for improving robustness of edge-based object tracking processes.
When tracking an object among multiple frames of a video sequence, the object boundary is identified in each frame. The object is the area within the boundary. The challenge in identifying the object boundary in a given frame increases as the constraints on a trackable object are relaxed to allow tracking an object which translates, rotates or deforms. Once the object is identified in one frame, template matching may be used in a subsequent frame to detect translation of the object. The template typically is the object as identified in the prior frame. Deformable models are used to detect objects which translate, rotate or deform. Various methods using deformable models are described below.
Edge-based segmentation algorithms for object tracking, such as an active contour model have been used for adjusting image features, in particular image object boundaries. In concept, active contour models involve overlaying an elastic curve onto an image. The curve (i.e., snake) deforms itself from an initial shape to adjust to the image features. An energy minimizing function is used which adapts the curve to image features such as lines and edges. The function is guided by external constraint forces and image forces. The best fit is achieved by minimizing a total energy computation of the curve. In effect, continuity and smoothness constraints are imposed to control deformation of the model. The model is the object from a prior frame. A shortcoming of the active contour model is that small changes in object position or shape from one frame to the next may cause the boundary identification to fail. In particular, rather than following the object, the estimated boundary instead latches onto strong false edges in the background, distorting the object contour.
Yuille et al. in xe2x80x9cFeature Extraction from Faces Using Deformable Templates,xe2x80x9d International Journal of Computer Vision, Vol. 8, 1992, disclose a process in which eyes and mouths in an image are identified using a model with a few parameters. For example, an eye is modeled using two parabolas and a circle radius. By changing the shape of the parabolas and the circle radius, eyes can be identified. Yuille et al. and other deformation models typically have encompassed only highly constrained deformations. In particular, the object has a generally known shape which may deform in some generally known manner. Processes such as an active contour model have relaxed constraints, but are only effective over a very narrow spatial range of motion. Processes like that disclosed by Yuille are effective for a wider spatial range of motion, but track a very constrained type of motion. Accordingly, there is a need for a more flexible and effective object tracker, which can track more active deformations over a wider spatial range.
According to the invention, a morphological process is performed as part of an object tracking and segmentation sequence. With an object tracked and segmented for a given frame, the morphological process smooths the resulting contour and removes erroneous edge points.
Edge-based object trackers have difficulty handling objects which move or change too rapidly. The limited search area of the tracker makes it difficult to follow the edge reliably. Consequently the tracker may identify a region of low contrast or lock onto an edge in the background when trying to identify the object. Such errors are reduced by the morphological process. In addition, edge-based object trackers have difficulty handling occlusions, because they latch onto the strong edge of the occluding object rather than the true object boundary. Such errors also are reduced by the morphological process.
Edge-based object trackers accumulate errors rapidly because the template for one frame is the detected object from the prior frame. As a result the edge-based object tracker may be unable to recover from an error. By performing the morphological process on a frame by frame basis, errors in the object tracker do not accumulate, and instead are filtered out.
According to one advantage of the invention, false edge points are more reliably removed when performing the morphological process after the object tracking and segmentation processes. By performing this postprocessing on a frame by frame basis errors are eliminated early and do not accumulate from frame to frame. Thus, the object boundary is more reliably identified and tracked from frame to frame.
According to an aspect of this invention, the morphological process receives an input corresponding to an object mask and a set of control points for the object. At one step the mask is filtered to eliminate extraneous control points. The purpose of such step is to eliminate image pixels which the object tracker tagged on around a control point due to sharp contrast, (e.g., sharp background contrasts). At another step the mask is expanded to recover certain control points eliminated during the first step. After the first two steps one or more islands may remain as the mask. At another step, one of the islands is selected as the mask. At another step any control points not on the mask boundary are removed. The final mask and set of control points then are output (i) for use in displaying the tracked object for the current image frame, and (ii) for use in identifying and tracking the object in the next image frame.