The present invention relates generally to methods and apparatuses for processing images, and more particularly to a method and apparatus for processing an image that includes detecting the location of eyes in a video facial image.
In desktop video conferencing systems for obvious reasons, the camera is usually located somewhere other than the center of the screen on which the image of the other conferee is being presented. Preferably the camera is located even out of the peripheral vision of the user to keep from being obtrusive. As a natural consequence, even when the viewer is looking directly at the screen, the viewer appears to the other conferee to be gazing inattentively off into space, which can be very distracting to the other conferee. Obviously, the viewer could look into the camera the entire time, but this would mean that the viewer would miss much of the information being presented on the screen.
As a result of the camera and screen being located in different positions, the eye movement in video conferencing systems does not match in-person meetings. However, eye contact is extremely important in interpersonal communications. Nevertheless, before video conferencing systems can replace these face-to-face meetings, it must create the look and feel of face-to-face meetings.
Attempts have been made to improve the look and feel of video conferencing systems to that which equals that of face-to-face meetings. In this area, approaches proposed to solve the eye-contact (also known as gaze tracking) problem have employed devices such as electronic shutters and half-reflected mirrors to make the camera physically or optically point at the user. While somewhat effective, these approaches are expensive and inconvenient. Expense is particularly an issue for those systems that expect to be deployed on individual personal computers or workstations due to the sheer numbers involved Inconvenience is also an issue in that people will not use systems that are awkwardly designed and implemented, which defeats the entire purpose of video conferencing systems.
To attempt to solve the gaze tracking problem, one can modify the image portion of the eyes so that the eyes are centered on the camera location rather than the screen. This requires processing of the pixels in the eyes to reorient them so they appear to be looking at the other person. Unfortunately, to perform this image processing, one must first detect the location of the eyes in the image, as only the eyes are processed in this manner.
Some approaches have employed headgear or sensors to detect the position of the eyes, which requires the user to remain very still. Both of these approaches are highly intrusive to the user. For the reasons discussed immediately above, most users will not wear headgear.
Another approach compares a library of models against the image until a match is found. This requires a database of models and a large amount of processing. As video conferencing is a live transmission, any large amount of processing is an impairment to implementation.
Yet another approach applies neural networks to determine the location of the eyes. In this case, neural networks are trained using reduced resolution images to find eyes. As with all neural networks, this requires training of the network. Training a neural network is a non-trivial problem, and can often delay or prevent implementation of a network in practical applications.
The present invention is therefore directed to the problem of developing a method and apparatus for detecting the location of eyes in an image that is simple and can be implemented in a video conferencing system.
The present invention solves this problem by first blurring the image before extracting the eye regions, eliminating the eyebrows in the eye regions, segmenting the eyes, and then extracting the eye parameters.
According to the method of the present invention, the image is first blurred using a Gaussian filter, such as:       g    ⁢          (              x        ,        y            )        =            1              ∑                  xe2x80x83                ⁢                  h          ⁢                      (                          x              ,              y                        )                                ⁢                  ∑        x            ⁢              xe2x80x83            ⁢                        ∑          y                ⁢                  xe2x80x83                ⁢                              f            ⁢                          (                              x                ,                y                            )                                ⁢                                    h              ⁢                              (                                  x                  ,                  y                                )                                      .                              
Next, the eyes are located within the image. Within this step, first, the search is limited to the center of the image, as the eyes are usually located near the center. Then, the contrast between the dark and light areas is used to locate and identify the eye regions. The next step returns to the original image, within which one can identify the eyes and eyebrows relatively easily. In this step, the eyebrows are removed by relying upon the fact that they are usually above the eyes. That which remains are the eyes. The next step is to segment the eyes into its constituent partsxe2x80x94the iris, the rounded corners and the whites of the eyes. This is accomplished using the intensity according to the following formula:       s    ⁡          (              x        ,        y            )        =            {                                                  eye              ⁢                              xe2x80x83                            ⁢              white                                                                          if                ⁢                                  xe2x80x83                                ⁢                                  g                  ⁡                                      (                                          x                      ,                      y                                        )                                                               greater than               T                                                                          iris              ,              corner                                            otherwise                              }        .  
In this case, the threshold is set high enough to segment all iris colors, but low enough to separate the entire white area. Next, the dark areas are identified as dark regions and the eye corners and irises are labeled at intensity 255 and the whites at intensity 0. Next, the eye parameters are extracted, which includes the iris radius, the iris center position, the four eyelid positions (both corners and upper and lower lids).
An apparatus for implementing the method of the present invention includes a digital camera for capturing the image and a processor. The processor first blurs the image to determine the location of the eyes, then extracts the eye regions and eliminates the eyebrows in the eye regions, segments the eyes, and then extracts the eye parameters. These eye parameters are then available for use by other programs or processors.