Tracking people in a real scene is an important application in the field of computer vision. Tracking techniques are used in many areas, including security, monitoring, research, and analysis. One method of tracking people is to track the head or face of the person. The location of the face of a person correlates to the location of the person, so tracking the face of a person allows us to know where the person is in a scene.
In order to be able to track the face of the person, we need to be able to locate a face in an image of a real scene. A variety of techniques is known in the field of computer vision for locating objects, and more specifically for locating faces. These techniques each have advantages and limitations, as well as varying processing requirements.
An area of computer vision and artificial intelligence that involves processing facial images is generally referred to as identification. To perform identification, a face of interest is compared to a database of known faces to determine a one-to-one match. The field of identification has a large body of research and many techniques are known in the field for performing identification. Typically, identification involves deriving or generating a large number of coefficients or features for each face, and then comparing these coefficients or features for each face to insure success in the identification. Identification techniques require a relatively large amount of computational power, good resolution, and a database of known faces. An application of identification is the use of facial recognition to match an unknown person to a database of known persons to determine who the unknown person might be.
One of the common techniques for tracking is referred to as feature matching. Feature matching looks for features of an object in a two-dimensional image and correlates these features in subsequent images to develop tracks corresponding to the motion of the object relative to the camera. The technique requires a relatively small amount of computational power. Applications of feature matching include tracking a moving object after it has been designated as an object of interest.
The problem of tracking has been addressed using a variety of techniques. A summary of tracking techniques is referenced by Richard J. Qian, ET. Al. in U.S. Pat. No. 6,404,900, Method for robust human face tracking in presence of multiple persons. In this patent, Qian teaches a method for outputting the location and size of tracked faces in an image. This method includes taking a frame from a color video sequence and filtering the image based on a projection histogram and estimating the locations and sizes of faces in the filtered image.
In the paper Parameterized structure from motion for 3D adaptive feedback tracking of faces, by Jebara, T. S., ET. Al. in Computer Vision and Pattern Recognition, 1997 a real-time system is described for automatically detecting, modeling and tracking faces in three dimensions. A combination of two-dimensional and three-dimensional techniques is used with a Kalman filter to predict the trajectory of the facial features and constrain the search space for the next frame in the video sequence.
U.S. Pat. No. 7,317,812 to Nils Krahnstoever, ET. Al. Method and Apparatus for Robustly Tracking Objects (Nils '812) includes background on many techniques for implementing video tracking systems. Nils '812 teaches a video image based tracking system that allows a computer to robustly locate and track an object in three dimensions within the viewing area of two or more cameras. The preferred embodiment of this invention tracks a person's appendages in three dimensions allowing touch free control of interactive devices.
U.S. Pat. No. 665,816 to Barrett L. Brumitt, System and process for locating and tracking a person or object in a scene using a series of range images, teaches a system and method for tracking people and non-stationary objects of interest in a scene using a series of range images of the scene taken over time. Range information is used to compute a background model that is subtracted from subsequent images to produce a foreground image. The foreground image is then segmented into regions of interest.
The methods of tracking people in the previous art fall into several categories. One category is methods that involve extracting the image of a person from an image in a series of images, predicting the location of the person in a subsequent image, and then processing the subsequent image with this knowledge to increase the success of tracking the object. This method involves significant processing of each person in each image.
Another conventional method involves using multiple cameras to provide images from different viewing angles. This method involves processing at least two images for each moment in time.
The application of tracking people based on their faces highlights the need for an efficient method for tracking people. The use of conventional techniques for feature matching is insufficient because all faces have the same general features. There is not enough information to distinguish one person from another in a given image. Using techniques of identification would require the resources of a relatively large amount of computer power, as well as good resolution and a database of known objects. These identification resources are not necessary to distinguish one person from another in a given image.
The methods of previous inventions involve significant quantities of data, complex processing, and/or multiple system resources to track objects. The current invention provides a method and system for improving tracking performance of multiple people using an efficient coefficient template, combined with other techniques for increased success tracking people of interest in a variety of environments.