This invention relates to tracking and segmenting an object within a sequence of image frames, and more particularly to methods and apparatus for segmenting and tracking a video object which may move or deform.
When tracking an object among multiple frames of a video sequence, an enclosed boundary of the object is identified in each frame. The object is the area within the boundary. The challenge in identifying the object boundary in a given frame increases as the constraints on a trackable object are relaxed to allow tracking an object which translates, rotates or deforms. Once the object is identified in one frame, template matching may be used in a subsequent frame to detect translation of the object. The template typically is the object as identified in the prior frame. Deformable models are used to detect objects which translate, rotate or deform. Various methods using deformable models are described below.
Yuille et al. in xe2x80x9cFeature Extraction from Faces Using Deformable Templates,xe2x80x9d International Journal of Computer Vision, Vol. 8, 1992, disclose a process in which eyes and mouths in an image are identified using a model with a few parameters. For example, an eye is modeled using two parabolas and a circle radius. By changing the shape of the parabolas and the circle radius, eyes can be identified. Yuille et al. and other deformation models typically have encompassed only highly constrained deformations. In particular, the object has a generally known shape which may deform in some generally known manner. Processes such as an active contour model have relaxed constraints, but are only effective over a very narrow spatial range of motion. Processes like that disclosed by Yuille are effective for a wider spatial range of motion, but track a very constrained type of motion. Accordingly, there is a need for a more flexible and effective object tracker, which can track more active deformations over a wider spatial range.
Active contour models, also known as snakes, have been used for adjusting image features, in particular image object boundaries. In concept, active contour models involve overlaying an elastic curve onto an image. The curve (i.e., snake) deforms itself from an initial shape to adjust to the image features. An energy minimizing function is used which adapts the curve to image features such as lines and edges. The function is guided by internal constraint forces and external image forces. The best fit is achieved by minimizing a total energy computation of the curve. In effect, continuity and smoothness constraints are imposed to control deformation of the model. The model is the object from a prior frame. A shortcoming of the conventional active contour model is that small changes in object position or shape from one frame to the next may cause the boundary identification to fail. In particular, rather than following the object, the estimated boundary instead may latch onto strong false edges in the background, distorting the object contour. Accordingly, there is need for an improved method for segmenting and tracking a video object.
According to the invention, object segmentation and tracking is improved by identifying a local portion of an object and detecting local deformation of such portion. An advantage of this technique is that object segmentation and tracking is significantly improved for instances where there is significant local deformation.
According to one aspect, a local affine along a coarsely estimated object boundary is identified by analyzing edge energy of a current image frame. Edge energy for points along a coarsely estimated object boundary are compared to the edge energy""s for such points in a previous frame. A sequence of contour points which have edge energy change ratios exceeding a threshold value is identified as a local affine. A refined estimate of the object boundary then is determined for the local affine.
According to another aspect, a local segmentation process based on a key contour point search strategy is implemented to refine the object boundary at the local affine. The local affine can be characterized in two equations having six unknown parameters which describe the shape of the local affine. These parameters are unknown and represented as six independent equations. Knowledge of the actual parameter would provide an indication of the actual local affine location. An improved estimate of the affine location is derived by reducing the set of equations. A value for each parameter is obtained by making an assumption and by selecting a key point. The assumption is that the front end point and back end point for the local affine move comparably with the main portion of the object. Thus, these points are taken as being the points from the coarsely estimated object boundary. A key point along the coarsely estimated object boundary is selected based upon a distance function. The front point, back point and key point define a curve shape. By selecting a better key point a better curve may be estimated for the local affine.
Candidate key points are selected from a search area. The candidate points are identified within the search area as being any image pixel point that has an edge energy which is larger than a prescribed percentage of the edge energy for the key point found in the previous frame. A corresponding set of parameters then is derived for each candidate key point based on the candidate key point, the front point and the back point.
The coarsely estimated boundary of the local affine then is warped using each set of parameters to derive a candidate curve corresponding to each candidate key point. An average edge energy change ratio for each given curve is derived, and the curve having the minimum value of the average edge energy change ratio values is selected as the optimal curve to estimate the boundary of the local affine. A set of edge points for the selected optimal curve is output as the set of edge points for the local affine portion of the object.
Note that the process for selecting a local affine is independent of the process for improving the boundary estimate for the local affine. The process for selecting the local affine may be used with any process for improving the estimated shape and location of the identified local affine. Similarly, the process for improving the boundary of a local affine can be used for any local affine regardless of how identified.