The present invention is directed to a method for detecting and tracking moving objects in a digital image sequence having a stationary background.
In various applications of machine vision (scene analysis, autonomous vehicle control, monitoring jobs) it is important to be able to detect moving objects by interpreting a temporal sequence of digital images, to be able to identify their shape and position and to be able to track their motion. This generally is achieved by segmenting an image sequence wherein the segments are grouped to form objects within an image. Objects in various images are identified with one another and the corresponding segment groups are combined to form trajectories. The resulting sequences of segment images and object trajectories can then be made available for further scene analysis evaluation by either a person or an intelligent system.
The following problems must therefore be addressed for recognizing and tracking subjects:
(1) Separating the moving image regions from the stationary background;
(2) Separating the subjects from one another, i.e., a segmentation of the moving image region, so that every moving object can have a group of segments allocated to it; and
(3) Correctly allocating the segment groups of the images to the sequence of subject trajectories (correspondence problem).
In addition to the object motions, changes in illumination and various noise sources also contribute to a temporal change of brightness. A practical system for object tracking must be able to distinguish object motions from other dynamic processes. Estimating the motion therefore has a central role in object tracking. Knowledge of the motion parameters of detected objects is also an important prerequisite for a correct combination of object mask segments into objects and for solving the correspondence problem.
Prior art methods for tracking general, independently moving objects can be divided into the following two classes:
Change Detection with Difference Images of Consecutive Images
The methods belonging to this class (P. Spoer, "Moving Object Detection by Temporal Frame Difference Accumulation", in Digital Signal Processing 84, V. Cappellini and A. G. Constantinides, editors, Florence 1984, pp. 900-907 and J. Wiklund, G. H. Granlund, "Image Sequence Analysis for Object Tracking", Proceedings of the 5th Scandanavian Conference on Image Analysis, Stockholm, June 1987, pp. 641-648) are based on the evaluation of difference images from consecutive images of the temporal sequence. These difference images are subjected to a threshold evaluation, from which a binary image corresponding to the threshold decision is produced. Typically, this also contains a residual noise (noisy pixels) that can be eliminated by a suitable filter operation (median filter, low-pass filter, quenching all segments whose size lies below a threshold).
The goal of this procedure is the separation of the moving image regions from the background and the acquisition of object masks whose segments reveal the shape and position of the objects. This type of prior art methods has two problems which generally lead to difficulties:
(1) Even under ideal conditions (complete freedom from noise, objects with high-contrast, extended textures that are clearly distinguished from the background), the segments of the object masks produced in this manner do not have a simple relationship to the plurality of objects and their shapes can not be uniquely reconstructed. Generally, the binary image obtained in this manner corresponds to the combination of two binary images that represent the object positions at two different times.
(2) Regions having low brightness gradients in the interior of the objects cause holes to occur in the corresponding segments of the object masks. A segment can also decompose into a plurality of parts.
(b) Segmenting Motion Vector Fields
A moving object corresponds to an image segment in whose interior a motion vector field is continuous and at whose edge the motion vector field is discontinuous at least at some locations. This situation forms the basis of a number of methods that, proceeding from the images of the sequence, estimate motion vector fields using various methods (A. V. Brandt, W. Tenger, "Obtaining Smooth Optical Flow Fields by Modified Block Matching", the 5th Scandanavian Conference on Image Analysis, Stockholm, June 2-5, 1987, Proceedings, Vol. 2, pp. 529-532 and B. K. Horn, B. G. Schunck, "Determining Optical Flow", Artificial Intelligence, Vol. 17, pp. 185-203, 1981) and subsequently segment these using some suitable continuity criterion (H. Kirchner, "Objektsegmentierung auf der Basis von Verschiebungsvektorfeldern (Object Segmentation based on Motion Vector Fields)", Lehrstuhl fuer Informatik 5, University of Erlangen-Nuernberg, W. Germany, 1987 and W. Tengler, H. Kirchner, A. V. Brandt, "Object Segmentation from Optical Flow Field", presented at the 5th IEEE Workshop on Multidimensional Signal Processing (MDSP), Noordwijkerhout, Netherlands, Sept. 14-16, 1987).
Such a procedure is basically suitable for avoiding the problems connected with the change detection. However, a main drawback of this approach is that knowledge of the object boundaries must be available or assumptions must be made in order to estimate the motion vector fields having the desired continuity properties. According to the concept, however, they are only subsequently acquired with the segmenting of the motion vector fields.
When, in addition to pure translations, the scene also contains objects with nonnegligable rotational motion, standard methods for estimating motion produce unusable results. The segmentation of motion vector fields is therefore not well suited for the analysis of image sequences having rotating objects.