This invention relates to image processing, and more particularly to object tracking and contour prediction in a video sequence.
Greatly increasing computing power allows for vastly more complex calculations to be performed on data. Video sequences can be analyzed or operated upon by fast yet cheap processors. The many frames of still images making up a video sequence can be compressed using motion vectors using the well-known motion-picture-experts group (MPEG) compression standards.
Computational algorithms can be used to detect foreground objects and follow these foreground objects around in the video sequence. Knowledge of the locations of such foreground objects, even imperfect guesses, can improve compression since more resources can be allocated to the foreground objects than to the background.
For example, a still image or a video sequence captured by a hand-held device such as a smart cell phone may be operated upon by a cheap yet powerful processor in the phone to compress the image, reducing the bandwidth required to wirelessly transmit the video. With sufficient computational power, more complex operations may be performed on the image, such as detecting foreground objects. Then the video compression can be improved by allocating more bandwidth for transmission of the foreground object while reducing bandwidth allocated to transmit the background.
Video surveillance applications may use processors to detect moving objects in video frames captured by a surveillance camera. The processors may follow these moving objects, perhaps drawing a contour or bounding box around the object in each frame and then allocating additional memory storage for the object, essentially allowing for a higher resolution of the moving object than for the background. When the object is a person or a car, the higher resolution may allow for the person's face or the car's license plate to be extracted from the video sequence.
Video archives can be processed in a similar manner by software that detects foreground or moving objects, and draws bounding boxes or contours around the object in each frame of the video sequence. Cataloging software could then list which frames the object is in, and which frames the object is absent from.
FIGS. 1A–B show a video sequence with tracking of the contour of a foreground object. In FIG. 1A, foreground object 10 is moving slowly to the right in frames T to T+3. In this example foreground object 10 is a fish that may be obscured by other objects such as bubbles or other fish.
Various algorithms exist that allow a computer or processor to extract the location of object 10 in frame T. For example, segmentation or watershed analysis can determine the contour or boundary of object 10 by the rapid change in color at the perimeter of object 10, which might be a yellow fish while the background is blue water. Contour 11 of object 10 can be extracted as points along a line having a maximum gradient or change in color between the fish and the water. Similar contour extractions could be performed for subsequent frames T+1, T+2, and T+3 to generate contours 11′, 11″, and 11′″ of FIG. 1B that track object 10 in these frames.
Contours 11, 11′, 11″, and 11′″ can be line segments along the object perimeter, or pixels along the perimeter, or can be defined in other ways. For example, the area within the contour may be stored as an object mask, either including the perimeter or excluding the perimeter, or all pixels within the object's predicted contour can be stored.
Other contour-prediction or object-tracking methods have been proposed, such as a “snakes” method and a mesh-based method that track points along the object boundary in subsequent video frames. However, these methods generally require significantly large and complex computations that may prevent real-time processing, since the computations can take more time on a processor than the video takes to capture, view, or transmit. Errors may occur when processing frames takes too long.
While such object tracking methods are effective in various situations, most are computationally expensive. What is desired is a less computationally expensive method of object tracking.