1. Field of the Invention
The present invention relates to a gaze position detection apparatus and method for correctly detecting a user's focus of attention as an input command signal to a computer system.
2. Description of the Related Art
An information of gaze or sight of attention region is widely used in various technical areas from human/machine interfaces to psychology. For example, gaze direction is now used as an input device instead of a mouse or a touch screen. If the focus change of a window or the movement of a cursor is executed by the user's focus of attention, mouse use is greatly reduced.
Furthermore, if a physical characteristic of a user who gazes at an interesting object is used, the user's interest is estimated in order to improve the human interface. For example, by combining gaze direction detection and speech recognition, stability of the speech recognition is improved. In the speech recognition of prior art, as a cause of error detection, start timing of speech recognition is not correctly detected in an actual environment including noise. In this case, the user's gaze at an icon (button) on a display triggers the start timing, and the speech recognition is stably executed. The user's physical status is also estimated by changes of the gaze direction. For example, by measuring changes of a driver's visual line as a function of time, sleeping or falling attention by fatigue is detected to prevent traffic accidents.
As a detection method of gaze position (focus of attention point) in the prior art, a first method using reflection of infrared rays and a second method using the image information are adopted. In the first method, the infrared ray is applied from a light emitting diode attached to a glass to the user's eyes and the rotation angle of the eyeballs is detected by reflection characteristics. A system based on this principle is already on the market. However, in this case, a special glass to restrict a motion of the user's body and a magnetic coil to detect a motion of the user's head is necessary to be attached and the user feels a great burden. Therefore, the detection of the gaze position in a natural state is difficult and this method is only used in limited areas such as medical treatment and psychology. As a result, the first method is not actually utilized in a human interface such as the computer.
In the second method, the image of the user's face or pupils from a TV camera is used. Especially, a center position on the pupil and three positions of marker on the glass are detected by image processing and the gaze position is calculated by a tigonometrical survey of the detected positions. In comparison with the first method, the user does not feel a great burden because the device to restrict the user's motion is not necessary. However, in the second method, several problems exist as application to the human interface, such as use of the infrared rays or the marker in the glass. In addition to this, two special cameras to control zoom, pan, and tilt are necessary. Therefore, system construction is largely complicated, and many and unspecified users can not easily use through the personal computer.
Another method using the image processing reduces the user's burden. A special device such as the glass is not necessary. In this method, transformation from a pupil pattern to gaze position on the display is learned by a neural networks.
[Shumeet, B., Dean Pomerleau "Non-Intrusive Gaze Tracking Using Artificial Neural Networks", Advance in Neural Information Processing Systems (NIPS) 6, 1994], [Shumeet, B., Dean Pomerleau "Non-Intrusive Gaze Tracking Using Artificial Neural Networks", CMU-CS-94-102, 1994. (http://www,cs.cmu.edu/afs/cs/user/baluja/www/techreps.html)]
In this method, a learning set as a pair of the pupil pattern to gaze an area on the display and a coordinate of the gaze position is prepared for all areas on the display. This learning set is updatedly learned by using back propagation to the neural networks. After the neural networks are learned, in response to the pupil pattern of the user, the coordinate (x, y) of the gaze position corresponding to the pupil pattern on the display is outputted. In this case, only one camera is necessary and special devices such as the glass are not necessary. The user's burden is greatly reduced, and this method is most suitable for the human interface. However, the following two problems still exist in this method.
(1) When a position of the user's head changes, precision of detection of the gaze position falls rapidly. For example, even if a first pupils' pattern at a first timing and a second pupils' pattern at a second timing are the same, the gaze position corresponding to the first pupils' pattern and the gaze position corresponding to the second pupils' pattern on the display are often different. In order to decide a difference between the first pupils' pattern and the second pupils' pattern, the coordinates of each gaze position is converted by position of the user's head. However, this method is not a perfect solution idea. PA1 (2) A long time is required to learn the neural networks. The learning must be executed for all areas on the display in order to correctly detect the gaze position, even if the user gazes any area on the display.