This invention is about the detection and tracking of moving objects in the scene for object-based image sequence (video) compression applications.
A video consists of a sequence of consecutive frames. Every frame is a still image itself but what creates the motion are the differences between consecutive frames. There are several advantages of automatically detecting and tracking the motion of objects in these frames. Automatic target tracking, robot navigation, and obstacle avoidance are the immediate examples. The analysis of moving objects in recent object-based video coding functionalities aims mainly such goals as selective coding of objects, improved coding efficiency, selective error robustness and content-based visual database query/indexing.
What is desired in detecting objects automatically is that, the detected object should have some semantic meaning in everyday life. The method should also be able to track the objects in addition to detecting them. It should be noted that several distinct objects may move independently, or a single object may have articulated motion. In these kind of situations, an object should be allowed to split into several objects and to merge into one again when needed. Furthermore, it should be considered that an object may be covered by another one for a while, and then emerge again in the scene. Besides all, for the sake of a high coding performance, the boundaries which are obtained as the result of the segmentation process should be as close as possible to the true boundaries of the object.
Up to now, moving objects on image sequences are worked on by various methods. Majority of these existing methods deal only with the tracking of objects which are pre-defined with marked locations. For example, U.S. Pat. No. 5,323,470, 5,280,530, and 5,430,809 deal only with the tracking of the human face. Examples of other patents, which can track without looking into the content of the object, are U.S. Pat. Nos. 5,537,119, 5,473,369, 5,552,823, 5,379,236, 5,430,809, 5,473,364 and Turkish Patent no 89/00601. In general, these methods achieve tracking by finding the counterpart of the object in the subsequent frame, given the object in the current frame. Since the automatic detection of objects is absent, all these methods are semi-automatic and can only work by user interaction. Besides, they do not consider complicated situations, such as splitting of the tracked object or existence of different motion characteristics at different parts of the object. Moreover, they do not attempt to achieve accurate object boundaries. On the contrary, the proposed method is designed to perform fully automatic object detection. In contrast to abovementioned methods, there are no limitations like human face only objects; there are no restrictions, at any stage of the method, based on the inputs. Another diversity compared to the existing state of the art is that the new method includes a tool which can find the accurate boundaries of an object in addition to its detection.
Some other existing methods target only to detect the moving objects using consecutive frames. For example, U.S. Pat. Nos. 5,258,836, 5,394,196, 5,479,218, 5,557,684, and Europan Patent no EP0691789A2 are related to this issue. In general, these methods define objects by clustering the estimated motion vectors. The boundaries of the objects found by these methods may be inadequate in some applications either due to utilization of block based motion estimates, or as a consequence of the the occlusions which are not taken care of. Usually, these motion vectors are estimated from the given image sequence but the resultant vectors may be erroneous. If segmentation is based on clustering of these possibly erroneous motion vectors, it is inevitable to obtain incorrect object boundaries. The proposed method, on the other hand, achieves the detection of objects by utilizing not only the motion vectors, but also the color information and the previous segmentation result. Another difference of the described method compared to those prior methods is the ability to achieve the detection and the tracking of objects simultaneously.
One of the achievements of this invention is to detect the objects in a given image sequence (video) recording and to segment them by keeping their original forms as encountered in everyday life, in an automated fashion. The boundaries of these objects can be found accurately, using the proposed method. The motion of an object may cause some portions of the background to be covered by the object, or conversely, may cause some covered portions of the background to be uncovered. The performance of the segmentation obtained by this invention does not degrade even in these cases. Additionally, this method can individually track distinct objects moving independently of each other, including those objects which come to a stop. Moreover, some complicated situations such as the splitting or merging of moving objects (i.e. by articulated motion) can be handled.