The present invention generally relates to an image detection and identification system, and more specifically to an apparatus and method for personnel detection, background separation and identification. Based upon the detection and/or identification of a person, applications can perform customized information manipulation that is relevant to such information.
The creation of computing environments which passively react to their observers, particularly displays and user interfaces, has become an exciting challenge for computer vision. Systems of this type can be employed in a variety of different applications. In an interactive game or kiosk, for example, the system is typically required to detect and track a single person. Other types of applications, such as general surveillance and monitoring, require the system to be capable of separately recognizing and tracking multiple people at once. To date, research in such systems has largely focused on exploiting a single visual processing technique to locate and track features of a user in front of an image sensor. These systems have often been non-robust to real-world conditions and fail in complicated, unpredictable visual environments and/or where no prior information about the user population was available.
For example, U.S. Pat. No. 5,642,431 discloses a face detection system that uses an image classifier and an output display device. A training process is employed which uses both face and non-face objects stored in a database to determine whether a face is detected. This system, however, is unable to continuously track the user""s face and adjust for real-time movements of the physical objects being detected. U.S. Pat. No. 5,532,741 discloses a camera and video system which are integrally combined. A mirror image of a user is displayed back to the user on a CRT. However this system is merely a passive video playback system which is superimposed on a video screen. There is no visual interactive system which processes displayed images or presents specific information on the basis of detected features of a person who is looking at the system.
In addition to detecting and tracking a person in a scene, various types of image processing, or manipulation, can also be employed in the context of the present invention. One possible type of manipulation that can be employed in this regard is the distortion of the image of the person, in particular the person""s face, for amusement purposes. This effect has been explored before on static imagery (such as personal computer imaging tools), but has not previously been applied to live video. For instance, U.S. Pat. No. 4,276,570 discloses a method and associated apparatus for producing an image of a person""s face at different ages. Images of old and young faces are mapped to one another, and image transformations are determined. Once these results are stored, a camera receives an image of a user""s face (possibly a photograph). The data of the person""s face is processed with the previously determined image transformations. Based upon the stored data, an xe2x80x9colder facexe2x80x9d is then digitally superimposed on areas of the younger face to produce an aged face of the user. This system is unable to perform processing in a real-time fashion, for instance on active video signals. Furthermore, this system does not involve any recognition of the person whose image is being shown, or automated face detection.
Thus, a robust system is still needed to perform accurate image processing, personnel recognition and manipulations in a real-time fashion.
A further complicating factor lies in the time frame over which a person is recognized and tracked. At one extreme, short-term tracking of a person is desirable, e.g. the ability to recognize the person from frame to frame as he or she moves within the scene being viewed. At the other extreme, long term tracking, i.e. the ability to recognize the same person over a hiatus of several days, is desirable in certain applications, particularly where interactivity is dependent upon characteristics of individuals. To be complete, the system should also be capable of mid-term tracking, to recognize when a given individual has momentarily left a scene being viewed and then returned.
It is further desirable, therefore, to provide a tracking and identification system which is capable of providing robust performance over each of these possible tracking periods.
The present invention provides a multi-modal visual person detection and tracking framework which also has the capability to identify persons over various periods of time. Through the use of depth, color and pattern tracking, images of one or more people in a scene can be tracked in real time in a variety of general conditions, with good results. A first module receives stereo image data from cameras and generates a disparity image, preferably through the use of the census algorithm, and locates one or more target regions in the disparity image by a connected components grouping analysis. A second module classifies and tracks each target region through color segmentation. A third module distinguishes and tracks individual facial features located within the target regions, based on grayscale patterns. Each module is able to be utilized individually or in combination with one or more of the other individual modules to locate and track the targets.
In a particular embodiment of the present invention, each module also computes a mode specific description of a user. The mode specific information is combined in a fourth module which estimates the identity of a person whose image has been detected, based upon a database of previously recognized targets. Once the identity of a person is estimated, real-time applications specific to the identified target can be implemented. This feature is also used to increase the robustness of the short-term tracking of an individual.
Another exemplary embodiment of the present invention provides an intelligent monitoring system which discriminates between faces and the background scene, and then tracks the faces in real-time. In addition to the determination of actual facial characteristics, the individual face is able to be identified. The identification of the face allows for execution of an application (i.e., a computer program) according to the identification of an individual from among a set of recent users.
Another exemplary embodiment of the present invention provides a real time virtual mirror comprising a detector which detects, tracks, and identifies faces in real time. The processor then creates a virtual mirror image for display in which the facial features are selectively distorted.