Gaze tracking involves the determination and tracking of the gaze or fixation point of a person's eyes on a surface of an object such as the screen of a computer monitor. The gaze point is generally defined as the intersection of the person's line of sight with the surface of the object being viewed. Schematically, this is shown in FIG. 1 where the person's left and right eyes, “C1” and “C2”, separated by interoculor distance “b”, gaze at a gaze point “P” on an object “O”.
When the object being viewed is the screen of a computer monitor, gaze tracking may be used for human-computer interaction such as increasing the resolution or size of the region where the user is gazing or using the gaze point as a cursor. Currently available gaze tracking systems may be categorized as non-video based systems and video based systems. Since video based systems are non-contacting, they have the advantage of being less obtrusive and more comfortable to the user.
The direction of a person's gaze is determined by a combination of their face orientation and eye orientation. When the head is held fixed so that the 3-D positions of the eyeballs are known in a fixed reference frame, 3-D gaze tracking may be performed by eye tracking. A common technique for eye tracking employs a video camera for capturing images of the eye in which light, such as provided by infrared light emitting diodes, is reflected from the eye. The captured images of the eye are then analyzed to extract eye rotation from changes in reflections. Video based eye trackers typically use the corneal reflection (the first Purkinje image) and the center of the pupil as features to track over time. Alternatively, they may use reflections from the front of the cornea (first Purkinje image) and the back of the lens (fourth Purkinje image) as features to track.
When the user views a scene through a stereo viewer having left and right two-dimensional (2-D) display screens, gaze tracking becomes more complicated. Schematically, this situation is shown in FIG. 2. In this case, a projection of the point “P” is displayed as point “P1” on a left stereo image “I1” (being displayed in the left 2-D display screen) and another projection of the point “P” is displayed as point “P2” on the right stereo image “I2” (being displayed in the right 2-D display screen). The two points “P1” and “P2” are shown as being displaced horizontally in their respective images by a pixel disparity that maps to a depth which indicates the 3-D position of the point “P” on the object “O”.
Thus, instead of both user eyes, “C1” and “C2”, gazing at the same gaze point “P” such as schematically shown in FIG. 1, when viewing the scene in a stereo viewer, the left eye “C1” is gazing on a point “P1” on the left stereo image “I1” while the right eye “C2” is gazing on a point “P2” on the right stereo image “I2” such as schematically shown in FIG. 2. As a result, by merely tracking one of the eyes, it is not directly known where the other eye is gazing at the time on its 2-D display screen without a depth map for the scene being displayed. In this case, if the depth map for the scene is available (e.g., its 3-D surface contour), then a previously determined (e.g., at calibration) depth-to-disparity map may be used to convert the depth at the location of the tracked point on one 2-D display screen to the offset position (disparity) of its corresponding location in the other 2-D display screen.
One problem with relying on a depth map for the scene being displayed in the stereo viewer is the calculation of the depth map is computationally intensive and the scene may frequently change. Thus, it may not be practical to always have an updated depth map of the scene available.
Rather than tracking only one eye, the gazes of both eyes may be tracked on their respective 2-D display screens. The problem with this approach, however, is that two-eye tracking may be inherently unreliable due to one eye being dominant over the other or it may be prone to error as a result of the positioning of the lighting and/or the video camera relative to the eyes. Two eye tracking may also increase processing time and/or add components cost.
Since the conventional gaze tracking shown schematically in FIG. 1 is commonly referred to as 3-D gaze tracking because a 3-D position of the gaze point “P” is determinable as long as the positions and orientations of the eyes “C1” and “C2” are known, the situation shown schematically in FIG. 2 is referred to herein as stereo gaze tracking since it requires the determination of stereo gaze points “P1” and “P2” on the stereo viewer in order to estimate the 3-D position of the gaze point “P”.