This invention relates to eye gaze tracking by analysis of images taken of a user""s eye. The invention relates more specifically to eye gaze tracking without calibrated cameras, direct measurements of specific users"" eye geometries, or requiring the user to visually track a cursor traversing a known trajectory.
Eye gaze tracking technology has proven to be useful in many different fields, including human-computer interfaces for assisting disabled people interact with a computer. The eye gaze tracker can be used as an input device, instead of or in addition to a mouse for a personal computer, for example, helping disabled people to move a cursor on a display screen to control their environment and communicate messages. Gaze tracking can also be used for industrial control, aviation, and emergency room situations where both hands are needed for tasks other than operation of a computer but where an available computer is useful. There is also significant research interest in eye gaze tracking for babies and animals to better understand such subjects"" behavior and visual processes.
There are many different schemes for detecting both the gaze direction and the point of regard, and many vendors of eye gaze tracking equipment (see for example web site http://ibs.derby.ac.uk/emed). Any particular eye gaze tracking technology should be relatively inexpensive, reliable, unobtrusive, easily learned and used and generally operator-friendly to be widely accepted. However, commercially available systems are expensive (over $10,000), complicated to install, and require a trained operator and a calibration process before each use session.
Corneal reflection eye gaze tracking systems project light toward the eye and monitor the angular difference between pupil position and the reflection of the light beam from the cornea surface. Near-infrared light is often employed, as users cannot see this light and are therefore not distracted by it. The light reflected from the eye has two major components. The first component is a xe2x80x98glintxe2x80x99, which is a very small and very bright virtual image of the light source reflected from the front surface of the corneal bulge of the eye; the glint is also known as the first Purkinje image. The second component is light that has entered the eye and has been reflected back out from the retina. This light serves to illuminate the pupil of the eye from behind, causing the pupil to appear as a bright disk against a darker background. This retroreflection, or xe2x80x9cbright eyexe2x80x9d effect familiar to flash photographers, provides a very high contrast image. An eye gaze tracking system determines the center of the pupil and the glint, and the change in the distance and direction between the two as the eye is rotated. The orientation of the eyeball can be inferred from the differential motion of the pupil center relative to the glint. The eye is often modeled as a sphere of about 12.3 mm radius having a spherical corneal bulge of about 7.4 mm radius (see xe2x80x9cSchematic Eyexe2x80x9d by Gullstrand, in Visual Optics, H. H. Emsley editor, 3rd ed., p. 348, Butterworth, Scarborough, Ont., 1955, which is hereby incorporated by reference). The eyes of different users will have variations from these typical values, but individual dimensional values do not generally vary significantly in the short term, and thus can be stored and used for a long period.
As shown in prior art FIG. 1, the main components of a corneal reflection eye gaze tracking system include a video camera sensitive to near-infrared light, a near-infrared light source (often a light-emitting diode) typically mounted to shine along the optical axis of the camera, and a computer system for analyzing images captured by the camera. The on-axis light source is positioned at or near the focal center of the camera. Image processing techniques such as intensity thresholding and edge detection identify the glint and the pupil from the image captured by the camera using on-axis light, and locate the pupil center in the camera""s field of view as shown in prior art FIG. 2.
Human eyes do not have uniform resolution over the entire field of view, nor is the portion of the retina providing the most distinct vision located precisely on the optical axis. The eye directs its gaze with great accuracy because the photoreceptors of the human retina are not uniformly distributed but instead show a pronounced density peak in a small region known as the fovea centralis. In this region, which subtends a visual angle of about one degree, the receptor density increases to about ten times the average density. The nervous system thus attempts to keep the image of the region of current interest centered accurately on the fovea as this gives the highest visual acuity. A distinction is made between the optical axis of the user""s eye versus the foveal axis along which the most acute vision is achieved. As shown in prior art FIG. 3, the optical axis is a line going from the center of the spherical corneal bulge through the center of the pupil. The optical axis and foveal axis are offset in each eye by an inward horizontal angle of about five degrees, with a variation of about one and one half degrees in the population. The offsets of the foveal axes with respect to the optical axes of a user""s eyes enable better stereoscopic vision of nearby objects. The offsets vary from one individual to the next, but individual offsets do not vary significantly in the short term. For this application, the gaze vector is defined as the optical axis of the eye. The gaze position or point of regard is defined as the intersection point of the gaze vector with the object being viewed (e.g. a point on a display screen some distance from the eye). Adjustments for the foveal axis offsets are typically made after determination of the gaze vector; a default offset angle value may be used unless values from a one-time measurement of a particular user""s offset angles are available.
Unfortunately, calibration is required for all existing eye gaze tracking systems to establish the parameters describing the mapping of camera image coordinates to display screen coordinates. Different calibration and gaze direction calculation methods may be categorized by the actual physical measures they require. Some systems use physically-based explicit models that take into account eyeball radius, radius of curvature of the cornea, offset angle between the optical axis and the foveal axis, head and eye position in space, and distance between the center of the eyeball and the center of the pupil as measured for a particular user. Cameras may need to be calibrated as well, so that their precise positions and optical properties are known. Details of camera calibration are described in xe2x80x9cA Flexible New Technique for Camera Calibrationxe2x80x9d, Z. Zhang, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1330-1334, 2000, (also available as Technical Report MSR-TR-98-71 at http://research.microsoft.com/xcx9czhang/Papers/TR98-71.pdf), hereby incorporated by reference.
During system calibration, the user may be asked to fix his or her gaze upon certain xe2x80x9cknownxe2x80x9d points in a display. At each coordinate location, a sample of corresponding gaze vectors is computed and used to accommodate head position, screen position and size, camera position, and to adapt the system to the specific properties of the user""s eye, reducing the error in the estimate of the gaze vector to an acceptable level for subsequent operation. This method is disadvantageous in that a user""s flow of thought is interrupted because the gaze target has nothing to do with the work the user wishes to perform. Further, the user may also be asked to click a mouse button after visually fixating on a target, but this approach may add synchronization problems, i.e. the user could look away from the target and then click the mouse button. Also, with this approach the system would get only one mouse click for each target, so there would be no chance to average out involuntary eye movements. System calibration may need to be performed on a per-user or per-tracking-session basis, depending on the precision and repeatability of the tracking system. A major disadvantage of the calibration process is that it requires the user""s cooperation, and thus is unsuitable for infants, animals and for non-cooperative subjects.
U.S. Pat. No. 6,152,563 to Hutchinson et al. describes a typical corneal reflection eye gaze tracking system. The user looks at a sequence of fixed points on the screen to enable the system to map a particular glint-pupil displacement to a particular point on the screen. U.S. Pat. No. 5,231,674 to Cleveland et al. teaches another corneal reflection eye gaze tracking system.
U.S. Pat. No. 5,325,133 to Adachi teaches a method for eye gaze tracking in which the relative brightness of the pupil image as observed from multiple displacement angles determines a gaze vector. Alternate light source activation, or use of light sources of different wavelengths, correlates particular light sources with particular pupil images or pupil brightness measurements.
European Patent Application EP0631222A1, incorporated herein by reference, teaches a method of calculating the center position of a pupil image wherein the brightness of a gazing point on a display is increased, causing a change in pupil area subsequently used to verify the pupil image center position. This application also teaches the use of a simple linear relationship between screen coordinates (u,v) and pupil image center coordinates (x,y), u=ax+b and v=cy+d, where parameters (a, b, c and d) are determined when pupil center position data is obtained at two locations.
U.S. Pat. No. 5,481,622 to Gerhardt et al. teaches a head-mounted eye-tracking system that constructs a mapping relationship between the relative position of the pupil image center position and the point of regard on a display screen. The user gazes at a cursor placed at a known position in a display screen, and the invention determines the pupil center position in image coordinates. This process is repeated many times, and a set of polynomial functions are eventually fitted to define the mapping relationship.
U.S. Pat. Nos. 5,231,674, 5,325,133, 5,481,622, 6,152,563 are all incorporated herein by reference.
While the aforementioned prior art methods are useful advances in the field of eye gaze tracking, systems that do not require user-apparent calibration would increase user convenience and broaden the acceptance of eye gaze tracking technology. A system for eye gaze tracking without calibrated cameras, direct measurements of specific users"" eye geometries, or requiring the user to visually track a cursor traversing a known trajectory is therefore needed.
It is accordingly an object of this invention to devise a system and method for eye gaze tracking wherein calibrated cameras and direct measurement of individual users"" eye geometries are not required.
It is a related object of the invention to devise a system and method for eye gaze tracking wherein the user is not required to fixate on a series of visual targets located at known positions, or to visually track a cursor traversing a known trajectory.
It is a related object of the invention to determine a gaze vector and to compute a point of regard, which is the intersection of the gaze vector and an observed object. The observed object is preferably a display screen or computer monitor, but may also include a desktop, a windshield, a whiteboard, an advertisement, a television screen, or any other object over which a user""s vision may roam.
It is a related object of the preferred embodiment of the invention that two cameras are used to capture images of a user""s eye, where each camera includes an on-axis light source, a focal center, and an image plane defining an image coordinate system. It is a related object of the preferred embodiment of the invention to capture images of a user""s eye such that the pupil center in each image and a glint resulting from the particular camera""s light source may be readily identified and located in the image plane of each camera.
It is a related object of the preferred embodiment of the invention that the cameras capture images of a set of reference points, or a test pattern, that defines a reference coordinate system in real space. The images include reflections of the test pattern from the user""s cornea, which is essentially a convex spherical mirror. The invention maps or mathematically relates the test pattern image in the camera image coordinate systems to the actual test pattern through spherical and perspective transformations. The parameters of the relation may include the eye-to-camera distance, the vertical and horizontal displacement of the eye from the test pattern, and the radius of cornea curvature.
The test pattern may comprise an unobtrusively interlaced pattern depicted in a display screen, a set of light sources around a display screen border that may be sequentially activated, a printed pattern around the display screen, a set of light sources placed on the display screen surface, or any other distinctive pattern not attached to the display screen but within the user""s view of the display screen vicinity. The test pattern is preferably invisible or not obtrusive to the user. The test pattern is preferably coplanar with the surface the user is viewing, but is not constrained as such, i.e. there may be separate reference and target coordinate systems sharing a known mapping relationship. The cameras are preferably positioned in the plane of the test pattern, and may for example be built into a computer display screen. Cameras may be attached to a head mounted device, such as a helmet or glasses. Alternately, the cameras may be positioned away from the reference plane and the plane of the user-viewed surface.
Once the invention defines the mapping between the reference coordinate system and the image coordinate system, the invention applies the mapping to subsequent images reflected from the user""s cornea. The glint from the on-axis light source, the focal center of the camera, and the pupil center define a plane in real space that intersects with a user-viewed planar surface along a line. This line contains the point of regard T, which lies between the glint and the pupil center as mapped onto the screen coordinate system. The line also contains point V, where a virtual light source would produce a glint at the pupil center of the reflected corneal image as seen by the camera. The gaze vector is the bisector of the angle between the focal center of the camera, the pupil center in real space, and point V.
The invention uses the mapping relationship already determined via the test pattern to compute where a virtual light source would have to be on the user-viewed surface to create a reference point in the pupil center in the camera image coordinate system. If uncalibrated cameras are used, two cameras are required to uniquely determine the point of regard T. If one calibrated camera is used, the distance from the camera""s focal center to the user""s pupil needs to be known or estimated; the focal length of the camera and an estimate of the distance between the user""s eyes can be used to estimate eye-to-camera distance.
The invention may also interpolate the location of points T or V from a test pattern around the perimeter of the display screen, including the mapping described above. At least one of the cameras may be head-mounted. A laser pointer can generate additional reference points, and can be actively aimed to establish a reference point at point V for example. Correction for foveal axis offsets may be added.
The foregoing objects are believed to be satisfied by the embodiments of the present invention as described below.