In a wide variety of image sequence processing and analysis tasks, including object-based video manipulation, there is a great need for an accurate method for tracking the boundary, motion, and intensity of a video object throughout an image sequence. The video object may be only partially visible in each frame of the image sequence because of self occlusion, or occlusion by another video object.
Tracking the boundary of an object has been discussed in M. Kass, A. Witkin, and D. Terzopoulos, "Snakes: Active Contour Models", International Journal of Computer Vision, volume 1, no. 4, pp. 321-331, 1988; F. Leymarie and M. Levine, "Tracking Deformable Objects in The Plane Using An Active Contour Model", IEEE Transactions Pattern Analysis and Machine Intelligence, volume 15, pp. 617-634, June 1993; K. Fujimura, N. Yokoya, and K. Yamamoto, "Motion Tracking of Deformable Objects By Active Contour Models Using Multi-scale Dynamic Programming", Journal of Visual Communication and Image Representation, vol. 4, pp. 382-391, December 1993; B. Bascle, et al., "Tracking Complex Primitives in An Image Sequence", in IEEE International Conference Pattern Recognition, pp. 426-431, October 1994, Israel; F. G. Meyer and P. Bouthemy, "Region-Based Tracking Using Affine Motion Models in Long Image Sequences", CVGIP: Image Understanding, volume 60, pp. 119-140, September 1994. The methods disclosed therein, however, do not address the tracking of the local deformations within the boundary of the object.
Methods for tracking local deformations of an entire frame using a 2-D mesh structure are disclosed in J. Niewglowski, T. Campbell, and P. Haavisto, "A Novel Video Coding Scheme Based on Temporal Prediction Using Digital Image Warping", IEEE Transactions Consumer Electronics, volume 39, pp. 141-150, August 1993; Y. Nakaya and H. Harashima, "Motion Compensation Based on Spatial Transformations", IEEE Transaction Circuits and System Video Technology, volume 4, pp. 339-357, June 1994; M. Dudon, O. Avaro, and G. Eud; "Object-Oriented Motion Estimation", in Picture Coding Symposium, pp. 284-287, September 1994, CA; C.-L. Huang and C.-Y. Hsu, "A New Motion Compensation Method for Image Sequence Coding Using Hierarchical Grid Interpolation", IEEE Transactions Circuits and System Video Technology, volume 4, pp. 42-52, February 1994; Y. Altunbasak and A. M. Tekalp, "Closed-form connectivity-preserving solutions for motion compensation using 2-D meshes," IEEE Transactions on Image Processing, volume 6, no. 9, September 1997. However, these methods always include the whole frame as the object of interest. They do not address the problem of tracking an individual object within the frame.
U.S. Pat. No. 5,280,530, which is herein incorporated by reference, discusses a method for tracking an object within a frame. This method employs a single spatial transformation (in this case affine transformation) to represent the motion of an object. It forms a template of the object, divides the template into sub-templates, and estimates the individual displacement of each sub-template. The parameters of the affine transformation are found from the displacement information of the sub-templates. Although this method employs local displacement information, it does so only to find a global affine transformation for representing the motion of the entire object. Therefore, while it tracks the global motion of an entire object, it cannot track any local deformations that occur within the object.
The method disclosed in commonly assigned U.S. patent application Ser. No. 08/636,622 entitled "Tracking Motion and Intensity Variations Using Hierarchical 2-D Mesh Modeling," Graphical Models and Image Processing, volume 58, no. 6, pp. 553-573, November 1997, which are herein incorporated by reference, tracks the boundary, local motion, and intensity variations of a video object throughout an image sequence. It also describes how 2-D mesh-based object tracking can be used for object-based video manipulation such as synthetic object transfiguration and augmented reality. However, this method assumes that the object being tracked is not occluded throughout the image sequence.
A method for occlusion-adaptive tracking of video objects using 2-D meshes is discussed in Y. Altunbasak and A. M. Tekalp, "Very low bit rate video coding using object-based mesh design and tracking," SPIE/IS&T Symposium on Electronic Imaging Science & Technology, San Jose, Calif., February 1996; Y. Altunbasak and A. M. Tekalp, "Occlusion-adaptive content-based 2-D mesh design and tracking for object-based coding," IEEE Transactions on Image Processing, volume 6, no. 9, September 1997. However, this method is developed for the purpose of video-object compression and is capable of tracking frame-to-frame motion of the video object only. It does not provide a complete representation of the motion and intensity variations of the video object throughout the image sequence. It also lacks disclosing a recipe as to how to link the parts of the object being tracked that become disjoint because of occlusion by other video objects in the scene.
Mosaicing is used as a means for obtaining an efficient and complete representation of video sequences in M. Irani and et al., "Efficient representations of video sequences and their applications," Signal Processing: Image Communication, volume 8, pp. 327-351, 1996. However, in this paper, mosaic construction is described for global transformations only and cannot be used for manipulation of video objects with local motion.
Although the presently known and utilized methods are satisfactory, they are not without drawbacks. Consequently, a need exists for an improved method for mosaicing and tracking an object in an image sequence which is undergoing local motion and intensity variations in the presence of occlusion.