1. Field of Invention
This invention relates to the area of image processing in general and particularly to the processing of images by methods designed to detect, locate, and track distinctive target objects in the image.
2. Discussion of Prior Art
Systems for detecting, localizing, and tracking distinctive targets are used for unsupervised observation applications, videoconferencing, human-computer interaction, and other applications. In general these systems use a video camera to capture a two-dimensional image stream, which is analyzed by a computer system. The methods for analyzing the image stream must solve the following problems:
Detecting and identifying targets: The system must provide a method to detect and identify targets in the camera image. Generally a target class is defined, describing all objects that are considered as targets. For example, defining the target class as human faces would restrict detecting, localizing, tracking of objects to human faces. The initial number of targets in the camera image might not be known, new targets might appear or existing targets disappear in successive camera images. The problem of detecting and localizing targets becomes more difficult if size, orientation, and exact appearance of the targets are not known, for example if a plurality of arbitrary human faces are to be detected in the camera image. PA1 Localizing targets: The system must be capable of localizing targets by determining their position and size in the camera image. PA1 Tracking targets: The position of each detected and localized target must be tracked in successive images, even though this target might be moving, changing its orientation or size by changing the distance to the camera. The system should continue to track targets robustly even if lighting conditions change or the tracked target is partially covered by other objects. PA1 Template matching: One or more pre-stored images of objects of the target class are used as templates to localize and track targets in the video stream. To locate a target, the templates are shifted over the camera image to minimize the difference between the templates and the corresponding region of the camera image. If the difference can be made small for one template, the camera image contains the target represented by this template. To track the target, this template is shifted over the region of the subsequent camera image, where the target's position is assumed. PA1 Model matching: A model for the target class is created, containing information about edges, proportions between edges, and other structural information about objects. Targets are located by extracting these features in the camera image and matching them to the target class model. Tracking of targets can be performed with the same method, but the high computational costs of this approach suggest other techniques like template matching for further tracking of targets. PA1 (a) In many applications pre stored templates call not cover the variety of objects in the target class. For example, the number of templates required to cover all human faces in all sizes, orientations, etc. would be much higher than a real-time tracking system can manage. PA1 (b) If the pre-stored templates do not cover all objects of the target class, manual operator intervention is required to point out target objects for further tracking. PA1 (c) Partial occlusions of a tracked target object result in substantial differences between the image of the tracked object and the stored template, so that the system loses track of the target. PA1 (a) The model for the target class can be very complex depending on the geometrical structure of the objects of this class, resulting in high computational costs to match this model against the camera image. PA1 (b) To extract geometrical structures of the camera image, this image must have a sufficient resolution (for example in order to locate human faces, eyes, nose, and mouth must be detectable as important geometrical substructures of a human face), requiring a high amount of data to process. PA1 The system can acquire and track targets automatically. Templates or pre-stored images of objects of the desired target class are not necessary. PA1 The system can acquire and track targets in an unsupervised manner. Human intervention is not required to manually select targets. PA1 Size, orientation, and exact appearance of targets need not to be known in order to detect and locate targets. PA1 The system acquires and tracks multiple targets simultaneously. PA1 The described tracking system is capable of rapid adjustments to changing lighting conditions and appearance of the tracked target, such as changes in the orientation. PA1 The computational costs for the methods described in this invention are substantially smaller than those of the prior art, resulting in significantly faster real-time tracking systems. PA1 The system is very resistant to partial occlusions of a tracked target. PA1 The system can be implemented using conventional hardware, such as a common videocamera and workstation or PC-type computer. PA1 creating a general target color classifier, classifying all colors typical for objects of a target class (such as human faces, etc.) as general target class colors; PA1 detecting target objects of the target class and locating their position in the image using the general target color classifier and the object's motion; PA1 creating an individual target color classifier for each such detected and located target by determining the colors that actually occur in the target, so that this individual target color classifier classifies all colors typical for the individual target as individual target colors; PA1 tracking the position of each target using the individual target color classifier and the target's motion in a search region restricted to an estimated position of the target; PA1 constantly adjusting the individual target color classifier to changing appearance of the target, due to changing lighting conditions or motion and orientation changes of the target. PA1 determining position and size of all tracked targets to adjust position of the camera and zoom lens.
Several techniques of the prior art have been developed in an attempt to address these problems:
In general, these techniques suffer from several well-known problems:
1. Template matching:
2. Model matching:
A fundamental problem of the technique of template matching becomes obvious when locating arbitrary objects of the target class for further tracking using templates. The templates must cover all possible appearances, orientations, and sizes of all objects of the target class in order to locate them. Because this requirement can not be met in case of eyes and lips of human faces as target class, P. W. Rander (Real-Time Image-Based Face Tracking, Carnegie Mellon University, 1993, Pittsburgh, Pa.) requires a user to manually point out these target objects in a camera image in order to generate templates of these objects. These templates are then tracked in subsequent images. U.S. Pat. No. 5,323,470, A. Kara, K. Kawamura, Method and Apparatus for Automatically Tracking an Object, uses template matching to automatically track a face of a person who is being fed by a robotics system, requiring a pre-stored image of the person's face but no manual user input. If the distance of the person to the camera is not known, the template and the camera image will not match each other. It is therefore suggested to use a stereo-based vision subsystem to measure this distance, requiring a second camera. Because the requirement of the pre-stored image of the target object, this system is unable to locate arbitrary faces. Another severe problem of this technique is the incapability to adjust to a changing appearance of the tracked object. In order to solve this problem, U.S. Pat. No. 5,280,530, T. I. P. Trew, G. C. Seeling. Method and Apparatus for Tracking a Moving Object, updates the template of the tracked object by tracking sub-templates of this template, determining displacements of the positions of each of the sub-templates, and using these displacements to produce an updated template. The updated template allows tracking of the object, though orientation and appearance of this object might change. This method still requires an initial template of the object to be tracked and is incapable of locating arbitrary objects of a target class, such as human faces.
Model matching is successfully used to locate faces in newspaper articles (V. Govin-daraju, D. B. Sher, and S. N. Srihari, Locating Human Faces in Newspaper Photographs, Proc. of IEEE-CS Conf. Computer Vision and Pattern Recognition, 1989, San Diego, Calif.). After detecting edges in the image, a structural model is matched against the located features. There are several significant disadvantages and problems with using the technique of matching structural models for localizing and tracking objects. The model for the target class must describe all possible appearances of targets. If targets appear very differently depending on orientation, the model becomes very complex and does not allow efficient real time tracking of targets. The process of model matching itself requires a sufficient resolution of the tracked target in the camera image to allow edge detection and feature matching, resulting in a considerable amount of data to process.
The present invention provides a novel image processing and target tracking system, based on a new scheme of dynamic color classification that overcomes these and other problems of the prior art.
Objects and Advantages
This invention differs fundamentally from conventional image tracking methods of the prior art in that patterns of color and motion are used as the basis for determining the identity of a plurality of individual moving targets and tracking the position of these targets continuously as a function of tine. Specifically, this invention improves on prior art methods in a number of important aspects:
Further objects and advantages of this invention will become apparent from a consideration of the drawings and ensuing description.