This invention relates to object tracking within a sequence of image frames, and more particularly to methods and apparatus for tracking an object using deformable templates.
When tracking an object among multiple frames of a video sequence, the object boundary is identified in each frame. The object is the area within the boundary. The challenge in identifying the object boundary in a given frame increases as the constraints on a trackable object are relaxed to allow tracking an object which translates, rotates or deforms. Once the object is identified in one frame, template matching may be used in a subsequent frame to detect translation of the object. The template typically is the object as identified in the prior frame. Deformable models are used to detect objects which translate, rotate or deform. Various methods using deformable models are described below.
Active contour models, also known as snakes, have been used for adjusting image features, in particular image object boundaries. In concept, active contour models involve overlaying an elastic curve onto an image. The curve (i.e., snake) deforms itself from an initial shape to adjust to the image features. An energy minimizing function is used which adapts the curve to image features such as lines and edges. The function is guided by external constraint forces and image forces. The best fit is achieved by minimizing a total energy computation of the curve. In effect, continuity and smoothness constraints are imposed to control deformation of the model. The model is the object from a prior frame. A shortcoming of the active contour model is that small changes in object position or shape from one frame to the next may cause the boundary identification to fail. In particular, rather than following the object, the estimated boundary instead latches onto strong false edges in the background, distorting the object contour. Yuille et al. in xe2x80x9cFeature Extraction from Faces Using Deformable Templates,xe2x80x9d International Journal of Computer Vision, Vol. 8, 1992, disclose a process in which eyes and mouths in an image are identified using a model with a few parameters. For example, an eye is modeled using two parabolas and a circle radius. By changing the shape of the parabolas and the circle radius, eyes can be identified. Yuille et al. and other deformation models typically have encompassed only highly constrained deformations. In particular, the object has a generally known shape which may deform in some generally known manner. Processes such as an active contour model have relaxed constraints, but are only effective over a very narrow spatial range of motion. Processes like that disclosed by Yuille are effective for a wider spatial range of motion, but track a very constrained type of motion. Accordingly, there is a need for a more flexible and effective object tracker, which can track more active deformations over a wider spatial range.
According to the invention, a hierarchy of deformation operations are implemented to deform a template and match the deformed template to an object in a video frame. The hierarchical deformation and matching is performed in multiple frames to track the object among such frames. At each level of the hierarchical processing, the constraints on the template deformations are relaxed, while the spatial range of the object boundary search is more confined.
For a given image frame, an edge energy analysis is performed to derive an edge energy representation of the image frame. Such representation includes an energy representation of the object boundary along with energy representations of other edges present in the image frame. When searching a frame to identify the location of the object, it is the energy representation which is searched using the hierarchy of deformation operations.
In a preferred embodiment, three levels of deformation and tracking are implemented. At a highest level, an initial template used for a current image frame is translated and rotated to coarsely locate the object boundary among the energy representation of the image frame, and thus, located the object within the given image frame. In some embodiments scaling also is performed at the highest level.
According to an aspect of this invention, at a middle level, an affine transformation is implemented to deform the template. In the affine transformation, lines of the template border are rotated or expanded. For example, parallel lines are rotated or expanded to vary distances between points on the lines, while maintaining the lines in parallel. For example, a global affine deformation operation is applied to the template.
According to another aspect of the invention, in addition or alternatively, the middle level includes a local affine transformation process in which an affine transformation is applied to a local sub-portion of the template. The sub-portions are selected or preselected by an operator. For example, when an object is selected to be tracked, the sub-portions are selected by the operator with the object. The sub-portions, in effect, are articulating portions of the object. For example, when the object is a body, the operator may select one or more appendages to be the articulations tracked. The appendages are the selected the sub-portions of the template. In another example the object is a car and the articulating sub-portion is a door. In yet another example, the object is a tree and the articulating sub-portion is a branch of the tree.
This middle level of deforming and tracking is used for adjusting the translated, rotated and/or scaled template to allow for moving articulations within the object. Specifically the constraints on trackable object motions are relaxed to encompass articulated motion of appendages or other sub-portions of an object. The middle level refines the template to get the template boundary close to the actual object boundary for the given image frame. Like in the high level process, the deformed template is compared to the energy representation of the image frame to improve an estimate of the object boundary location.
At the lowest level, a local segmentation algorithm is applied in a preferred embodiment to deform the now close boundary to finely match the object boundary. At the lowest level the allowed deformations of the object are the least constrained. However, such motions may occur in a more limited spatial range than for the middle level process or the high level process.
In various embodiments, the middle level deformation process(es) are performed alone, or with the high level deformation and/or low level deformation processes. Further, various low level deformation processes may be included, such as an active contour model or another local segmentation algorithm.
According to an advantage of this invention, an accurate boundary of an object is tracked for objects which deform or include rapidly moving sub-portions. The ability to track a wide variety of object shapes and differing object deformation patterns is particularly beneficial for use with MPEG-4 image processing systems.
These and other aspects and advantages of the invention will be better understood by reference to the following detailed description taken in conjunction with the accompanying drawings.