1. Field of the Invention
The present invention relates to sophisticated interfaces between humans and machines. More particularly, the invention concerns a method and apparatus for analyzing a scene containing multiple subjects to determine which pupils correspond to which subjects.
2. Description of the Related Art
As more powerful human-machine interfaces are being developed, many such interfaces include the capability to perform user detection. By detecting the presence of a human user, a machine can manage its own functions more efficiently, and more reliably respond to human input. For example, a computer may employ user detection to selectively activate a screen saver when no users are present, or to display advertising banners only when a user is present. As another application, in home-based television viewing monitors for assessing xe2x80x9cNielsonxe2x80x9d ratings, it may be useful to determine how many people are watching a television. User detection techniques such as face detection may also be used as a valuable precursor to eye gaze detection. In addition, face detection will likely be an important component of future human-machine interfaces that consider head and facial gestures to supplement mouse, voice, keyboard, and other user input. Such head and facial gestures may include nodding, leaning forward, head shaking, and the like. Thus, user detection is an important tool that enables a more natural human-machine interface.
Some user detection techniques are already known. For instance, a number of techniques focus on face detection using a combination of attributes such as color, shape, motion, and depth. Some of these approaches, for example, include template matching as described in U.S. Pat. No. 5,550,928 to Lu et al., and skin color analysis as described in U.S. Pat. No. 5,430,809 to Tomitaka. Another approach is the xe2x80x9cIntervalxe2x80x9d system. The Interval system obtains range information using a sophisticated stereo camera system, gathers color information to evaluate as flesh tones, and analyzes face candidate inputs with a neural network trained to find faces. One drawback of the Interval system is the substantial computation expense. An example of the Interval system is described in Darrell et al., xe2x80x9cTracking People With Integrated Stereo, Color, and Face Detection,xe2x80x9d Perceptual User Interface Workshop, 1997. Although the Interval system may be satisfactory for some applications, certain users with less powerful or highly utilized computers may be frustrated with the interval system""s computation requirements. The following references discuss some other user detection schemes: (1) T. Darrell et al., xe2x80x9cIntegrated person Tracking Using Stereo, Color, and Pattern Detection,xe2x80x9d 1998, and (2) T. Darrell et al, xe2x80x9cActive Face Tracking and Pose Estimation in an Interactive Room,xe2x80x9d 1996.
As a different approach, some techniques perform user detection based on pupil detection. Pupil characteristics may be further analyzed to track eye position and movement, as described in U.S. Pat. No. 5,016,282 to Ptomain et al. Although the ""282 patent and other pupil detection schemes may be satisfactory for some applications, such approaches are unable to process multiple faces and multiple pupils in an input image. Some difficulties include determining which pupils belong to the same face, and accounting for a partially off-screen person with only one pupil showing.
Thus, when multiple people and multiple pupils are present in an image, there may be considerable difficulty in associating pupils with people in order to detect how many people are present. In this respect, known approaches are not completely adequate for some applications due to certain unsolved problems.
Broadly, the present invention concerns a method and apparatus for analyzing a scene containing multiple subjects to determine which pupils correspond to which subjects. First, a machine-readable representation of the scene, such as a camera image, is generated. Although more detail may be provided, this representation minimally depicts certain visually perceptible characteristics (such as relative locations, shape, size, etc.) of multiple pupil candidates corresponding to multiple subjects in the scene. A computer analyzes various characteristics of the pupil candidates, such as: (1) visually perceivable characteristics of the pupil candidates at one given time (xe2x80x9cspatial cuesxe2x80x9d), and (2) changes in visually perceivable characteristics of the pupil candidates over a sampling period (xe2x80x9ctemporal cuesxe2x80x9d). The spatial and temporal cues may be used to identify associated pupil pairs, i.e., two pupils belonging to the same subject/face. Some exemplary spatial cues include interocular distance between potentially paired pupils, horizontal alignment of pupils, same shape/size of pupils, etc. In addition to features of the pupils themselves, spatial cues may also include nearby facial features such as presence of a nose/mouth/eyebrows in predetermined relationship to potentially paired pupils, similarly colored irises surrounding the pupils, nearby skin of similar color, etc. Some exemplary temporal cues include motion or blinking of paired pupils together. With the foregoing analysis, each pupil candidate can be associated with a subject in the scene.
In one embodiment, the invention may be implemented to provide a method for analyzing a scene containing multiple subjects to determine which pupils correspond to which subjects. In another embodiment, the invention may be implemented to provide a computer-driven apparatus programmed to analyze a scene containing multiple subjects to determine which pupils correspond to which subjects. In still another embodiment, the invention may be implemented to provide a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital data processing apparatus to perform operations for analyzing a scene containing multiple subjects to determine which pupils correspond to which subjects. Still another embodiment involves a logic circuit configured to analyze a scene containing multiple subjects to determine which pupils correspond to which subjects.
The invention affords its users with a number of distinct advantages. First, unlike prior techniques, the invention is capable of determining which pupils belong to which faces/subjects in a scene with multiple subjects. In a scene with multiple subjects, understanding the pupil-subject relationship is an important prerequisite for tracking facial expressions, tracking movement, tracking user presence/absence, etc. As another advantage, the invention is inexpensive to implement when compared to other detection and tracking systems. For example, no dense range sensing is required. Also, an inexpensive camera may be used when a suitable lighting scheme is employed to cancel noise. The analysis provided by the invention is particularly robust because it is based on the grouping of multiple cues, both spatial and temporal. The invention also provides a number of other advantages and benefits, which should be apparent from the following description of the invention.