1. Field of the Invention
The present invention relates to an image pickup apparatus and a method for eye tracking employing the same. More specifically, the present invention relates to an image pickup apparatus which can effectively utilize movement of a person and movement of the eyes in the field of man-machine interface and of intelligent visual communication so as to facilitate extraction of features of an object to be imaged, or picked up, by using an illuminating apparatus, and to a method of eye tracking for detecting eye fixation in non-contact manner by using feature points of one's face, pupils and images reflected from cornea and so on extracted from the images picked up by the image pickup apparatus.
2. Description of the Related Art
Recently, computers have been developed to realize complicated and various functions, and the application field thereof has become wider and wider. Not only experts but ordinary people have come to use computers. In order to facilitate usage of systems which have been and will be more and more complicated, the technology of man-machine interface becomes more and more important. Human beings communicate not only by the language but also expressions and gestures, assuming and conforming intentions of the partner. Especially, the eye movement plays an important role for enabling a good communication.
The movement of the eyes much reflects one's intention. If the movement of gazing points on a screen of a user facing some displaying apparatus can be continuously detected in an interface, it may be a good help to know the intention of the user. It may be a good help to know where the user watches as it leads to what the user has in mind or what he is wondering about.
In view of the foregoing, the inventors of the present application have pointed out the importance of an effective eye tracking method for realizing an interface function capable of supposing an intention of operation of the user to respond flexibly and for realizing, in the field of intelligent communication, a very responsive and receiver oriented visual communication by extracting an object of interest in accordance with the movement of the eyes of a receiver and by feeding the same back to a transmitter.
An eye camera has been well known as the above described eye tracking apparatus. However, the eye camera is not very suitable for utilizing the movement of the eyes in the man-machine interface or in visual communication, since a user must wear glasses and the head of the user must be fixed in order to track eye fixation in a coordinate system of a display which the user faces. A method of detection utilizing image processing is more convenient in detecting eye movement in a non-contact manner without the need of wearing any special device.
FIG. 1 shows an example of a structure of a non-contact eye tracking apparatus. Referring to FIG. 1, cameras 2, 3 and illuminating apparatuses 4 and 5 are respectively provided on both sides of a display 1. In such a non-contact eye tracking apparatus, a first problem to be solved is to pickup by the cameras 2 and 3 the images of a user illuminated by the illuminating apparatuses 4 and 5 and to extract a plurality of feature points necessary for tracking eye movement from the picked up images. The second problem is to measure the spatial positions of the feature points at high speed with high precision. The third problems is to find the direction of eye fixation and the gazing points on the display based on the positions of the feature points.
Since a person and especially his eyes move fast, clear images must be picked up and the features points must be extracted by simple image processing, in order to follow and detect the movement exactly. However, in the actual application in a room, illumination of the user changes as the reflection of light from the display is influenced by external illuminations such as fluorescent lamp, so that images of good and uniform quality cannot be always provided. If the inputted images are inferior in quality, it takes much time to reduce noise, and accordingly, fast operation cannot be expected.
A person who is the object to be picked up may be illuminated in order to solve the above described problem. However, such illumination has the following drawbacks. First, it is difficult to provide a natural environment for the interface. More specifically, incandescent lamps, xenon lamps and halogen lamps have been well known as illuminating apparatuses, which have wide range of wavelength, and the distribution thereof is centered on the visible wave range. Therefore, it is not very good to illuminate the user from ahead to provide natural interface.
Secondly, the apparatus becomes large, and much heat is generated. Namely, in order to improve the conditions of illumination of the conventional illuminating apparatuses in view of the application to the interface, optical parts such as a band-pass filter and a polarizer must be attached in front of the illuminating light source. For example, when a near infrared illumination which cannot be sensed by a human being is used to catch the reflected light, visible lights must be intercepted. However, the above mentioned conventional illuminating apparatuses have low efficiency in emitting light and much heat is generated therefrom. Consequently, the temperature around the apparatus becomes higher. Consequently, the apparatus cannot be made compact by, for example, providing a light source and the optical elements integrally, so that a large illuminating apparatus must be inevitably used.
Thirdly, although illumination is effective in extracting feature points and clear images are provided when the illumination is properly used, if the illumination is improperly used, the illumination becomes a noise source and provides reverse effect. Detailed description will be given in the following using a case of extracting feature points of a person or the eyes, as an example.
FIG. 2 shows an example of an experiment to extract blue marks corresponding to a face of a person using the conventional illuminating apparatus and the image pickup apparatus. FIGS. 3A and 3B show an example of extraction of the feature points of the face provided by the experiment shown in FIG. 2.
Blue marks 6 are applied on 4 portions of the face as shown in FIG. 3A. Reference light 9 is emitted from the illuminating apparatus 8 to an object 10 as shown in FIG. 2, the light reflected from the object is caught by a camera 10, blue component is extracted from the picked up image and is thresholded. The result is as shown in FIG. 3B. As is apparent from FIG. 3B, noise components 7 as well as the blue marks 6 are taken. The reason for this is as follows. Namely, when the object 10 is illuminated by the reference light 9, the light reflected therefrom can be divided into two components in a broad sense. One is the light 11 diffused and reflected at the surface of the object 10, which reflects the optical nature of the material reflecting the light. Therefore, the component is effective in extracting features points such as parts of the face (mouth, eyelash, nose and so on) and the pupil, except the images reflected from cornea, out of those feature points which are necessary in eye tracking. The other is the component regularly reflected from the surface of the object 10, which reflects the optical nature of the light source. The component 12 does not reflect the nature of the object 10, so that it tends to be the noise. The latter component is included much at smooth portions of the object 10. More specifically, when the object is a person, sweats on the face, glass frame, glasses, plastics and glasses around that person and so on are such smooth portions. In the example shown in FIG. 3B, the noise 7 corresponds to the sweat.
In the example shown in FIGS. 3A and 3B, blue marks applied on the face are extracted. The foregoing is similarly applied when marks of different colors are used and the color components are to be extracted. The regularly reflected component 12 becomes noise in most case, when portions such as the eyes, nose, mouth and eyelashes are extracted from the natural image without using marks. A so-called active stereo vision is used as another method for detecting the shape of one's face without using marks, moire topography and slit ray method are the representatives of such method, in which the object is illuminated by a prescribed controlled shape pattern, thresholded images (reflected pattern) are extracted from the reflected images and the three dimensional shape of the object is measured by using the feature of the reflected pattern corresponding to the extracted pattern. In this method also, if the object tends to regularly reflect light, images formed by the regular reflection becomes noises, making it difficult to properly extract the reflected patterns, which are the features of the object.
Problems in association with the arrangement of the conventional illuminating apparatus and of the efficiency in extracting pupil feature point will be described in the following. In order to detect eye fixation of a user wearing no special device, a plurality of feature points must be extracted by image processing, as will be described later. A pupil is an opening of an iris, which generally looks dark. Therefore, the iris is dark brown, the pupil must be distinguished from the iris to be extracted. The pupil is a good feature point which is widely applied, since the size of the pupil is convenient, it is not very much influenced by the movement of eyelid, and it is convenient in the conversion to the eye fixation. The extraction of the pupil has been carried out in an eye camera and so on and a number of methods for extraction of pupil have been known. For example, such methods are disclosed in U.S. Pat. No. 4,102,564, U.S. Pat. No. 4,145,122, U.S. Pat. No. 3,689,135, U.S. Pat. No. 4,075,657, U.S. Pat. No. 4,755,045, U.S. Pat. No. 4,303,394, U.S. Pat. No. 4,651,145 and U.S. Pat. No. 4,702,575. In one type of eye cameras, a light source is incorporated in glasses to illuminate the eyeballs, and the reflected light is picked up to measure the intensity of the reflected light from the pupil and from the iris. In an apparatus such as an eye camera which is used attached on ones head, the distance between the illuminating apparatus and the image pickup apparatus is small and the apparatus moves corresponding to the movement of the head. Therefore, the illuminating apparatus have only to illuminate the eyeballs, and the image pickup apparatus have only to pick up the images of the eyeballs. Therefore, the influence of noise is small and the pupil can be extracted dependent only on the difference in intensity of the reflected light from the pupil and from the iris.
However, eye tracking for the application to the interface must be carried out in the non-contact manner as described above. Therefore, not only the portions of eyeballs but a widerange permitting tracking of a movement at a distance should be taken. For example, a method for extracting pupils out of images of one's face must be considered. In such case, it is very difficult to separate the pupils from the background noises by the above described methods.
FIG. 4 shows another method for extracting the pupil, in which the light enters the pupil and the light reflected from the retina is picked up. Referring to FIG. 4, a half mirror 23 is provided on an optical axis 22 of a taking lens 21 of a camera 20, and a conventional illuminating apparatus 24 is arranged such that the optical axis thereof coincides with the optical axis 22 by means of the half mirror 23. In order to make uniform the distribution of the reflected intensity from the pupil by using an illuminating apparatus with one light source, that is, in order to obtain an image of the pupil having little unevenness in intensity, the usage of the half mirror 23 is essential.
A visible wavelength cut off filter 25 is provided in front of the illuminating apparatus 24. The visible wavelength component of the light from the illuminating apparatus 24 is cut off by this filter 25, and the remaining part of the light meets the optical axis 22 of the lens 21 to illuminate the user 26. The light enters the pupil of the user 26, reflected at the retina, passes through the half mirror 23 to be picked up by the camera 20. Therefore, the pupil is taken brighter than the iris.
However, the method shown in FIG. 4 had the following drawbacks. Namely, the apparatus becomes large as it employs a half mirror 23. When the range of image pickup is not only the portions near the eyeballs but wider including, for example, the face itself, the influence of the noise becomes large even in this method, making it difficult to extract pupils properly. In order to reduce the influence of noises, the intensity of illumination may be increased. However, it is not very effective since the intensity of light is reduced by the visible wavelength cut off filter 25, further reduced to 1/2 by the half mirror 23, and the reflected light is further reduced to 1/2 by the half mirror 23, that is, much of the light from the illuminating apparatus 24 is lost by the time it is caught by the camera 20. When the intensity of illumination is increased, much power is consumed and much heat is generated. In order to reduce other influences, respective parts may have to be mounted apart from each other, which leads to further increase in size of the apparatus. If the illumination is too intense, it will be the physiological burden on the eyes of the user. Therefore, this method is not suitable either to be attached aside the display for the application to the interface.
An image reflected from cornea is a the virtual image formed by the light regularly reflected on the convex surface of the cornea, and it moves in the same direction at the eyeballs in accordance with the movement of eye fixation. Therefore, it is one of the feature points necessary for eye tracking. A problem in extraction is separation from background noise.
FIG. 5 shows an example of an apparatus which takes images reflected from the cornea by a camera. Referring to FIG. 5, when light from a reference light source 31 illuminates eyes of a user 30, images reflected from the cornea which are necessary for eye tracking are picked up by the camera 32. However, not only the light from the reference light source 31 but also from the display surface of the display 33 and from external illumination 34 such as a fluorescent lamp enter the eyes of the user 30, which is reflected from the cornea to provide virtual images. Therefore, there are various images reflected from the cornea in the eyeball, which are the noises making difficult the extraction of the images reflected from the cornea made by the light from the reference light source 31.
When a moving object such as a person or eyes is to be photographed, the photographing time should be as small as possible in order to provide images without blurring. A camera with an electric shutter has come to be used recently. Such cameras having short photographing time require intense illumination. However, if a conventional illuminating apparatus is used for illuminating the user in the interface, the user is exposed to heat and intense light for a long period of time, which may affect the eyes or the body of the user.
Meanwhile, stereo vision measurement has been known in the field of measuring spatial positions of the feature points. However, since the extraction of the feature points was difficult as described above, real time measurement of the spatial positions of the feature points on the face or of the eyeballs has not yet been carried out taking the use for the interface in consideration.
Some methods have been proposed for eye tracking by image processing. However, various conditions must be satisfied to apply any of these methods for eye tracking and therefore the field of application is limited, since effective extraction of the feature points is difficult and high speed detection of the spatial positions is difficult in any of these methods. In most of these methods, one white and black camera is employed. One such example will be described in the following.
FIG. 6 illustrates a method for detecting the iris and pupil in the white of the eye on a picked up image to detect eye fixation by using the same. Referring to FIG. 6, the face of a person is picked up, or imaged, by a camera 41, the portion 42 of the eye is extracted from the picked up image of the face, and the dark portion 43 and the white 44 in the eye are separated from each other. Thereafter, the length a of the white 44 and the length x from an edge of the white 44 to the center of the dark portion 43 are calculated. The direction of eye fixation is approximately in proportion to x/a. Since the process for extracting the white 44 and the dark portion 43 of the eye is difficult to carry out at a high speed, real time detection has not yet been realized. In this method, the degree of freedom is limited to the rotation movement of the eyeball unless the position and direction of the face are calculated by some method or another. The precision in detecting the rotation of the eyeball is not very high, since the size of the eye changes as the user changes his expression. The upward and downward movement of the eye is especially unique and complicated, as the dark portion 43 changes influenced by the movement of the eyelid.
FIG. 7 illustrates a method for detecting the pupil and the images reflected from the cornea as feature points by one camera. Referring to FIG. 7, the position of a reference light source 51 is assumed to be known in association with the coordinate system of the camera 50. The spatial position of the image 53 reflected from the cornea generated by the reference light from the reference light source 51 and the spatial position of the pupil 54 are determined independently based on the position of the center of rotation of the eyeball 52 and on the rotation angles .alpha. and .beta. in the upward, downward, left and right directions of the eyeball. Therefore, when the position of the image reflected from the cornea, the position of the pupil 54, the radius a of the eyeball which is the structural parameter of the eyeball 52, the radius of curvature c of the cornea, and the distance b between the center of the eyeball to the center of the curvature of the cornea are known, the center of rotation of the eyeball 52 and the rotation angle of the eyeball 52 are determined, and accordingly eye fixation can be determined. The positions of the image 53 reflected from the cornea and of the pupil 54 can be obtained from one projected image by providing a condition that the distance from the camera to the eyeball 52 is approximately constant. In this manner, the eye fixation can be detected in association with the rotation angle .alpha. in the left and right directions and the rotation angle .beta. in the upward and downward directions of the eyeball 52. However, the precision in detection becomes lower when the face moves in the direction of the z axis, from the above described condition.
In the examples shown in FIGS. 6 and 7, it is difficult to detect the gazing point, since only one camera is employed. If any one of the above described methods is to be applied to the interface, it is necessary to know what point on the display the user is gazing. Therefore, the gazing point must be provided in the display coordinate system. In order to know the gazing point of the user on the display coordinate system, it is necessary to have the user gaze a predetermined point on the display surface and to calibrate the system based on the data. However, when only one camera is used, there will be various and many parameters for calibration, which make the calibration complicated and inferior in precision.