The present invention relates generally to eye gaze tracking system and, more particularly, to eye gaze tracking including adaptive fixation estimation.
In eye gaze tracking, a camera placed near a visual display records graphical data including the position of the user's pupils and the locations of reflected glints from infrared LEDs. By processing the graphical data and calibrating the position of the pupil and glints with the visual display, a streaming estimate of the spot on the display viewed by the user may be generated and recorded as a stream of gaze points (x, y, t).
Gaze tracking provides actual and potentially real-time knowledge of a viewer's attention including places on the screen that draw attention first, the length of time the user spends looking at any given place, and the sequence of places receiving attention. For some applications, gaze tracking can produce results that seem akin to mind reading by accessing a user's unconscious processes. The technology has found use in a wide variety of applications. For example, a gaze tracking stream in real time has been used to augment a graphical user interface. Objects represented on a computer screen may be selected or even accessed simply by looking at object. A prototype human-computer dialog system uses eye gaze tracking to provide visual context to the user's speech. For example, a virtual tourist application may use gaze tracking to provide the antecedent when a user makes an otherwise vague statement like “I would like to stay in this hotel.” In an eye gaze analytics application, eye gaze tracking can be recorded while subjects are navigating the web or testing a novel interface. Analysis of the recorded gaze tracking data may provide otherwise inaccessible feedback for web or interface design.
A typical eye gaze tracking system may consist of hardware including a visual display, camera and processor with image processing software to estimate and record the points of focus of a user's gaze. Higher level applications may be implemented to use or interpret gaze.
When a stream of eye gaze tracking data is analyzed, the first step is typically the identification of groups of data points that together represent eye fixations. Fixations are defined as a brief period of time (e.g., ¼ sec) where the point of focus of the eye is relatively stationary. A fixation represents a span of time where the user's attention is fixated on a discrete portion of the visual stimulus. When the fixation comes to an end, the eyes execute a sudden motion called a saccade, moving the user's attention to another fixation point where the user examines another discrete portion of the stimulus.
Fixation is a natural part of our vision. The physiology of the human eye dictates the specifics of these fixations as much as our conscious attempts at attention. As such, fixations are best defined by observation and can only be weakly controlled by engineering and design. After fixations are determined and detected, they can be used for higher level applications, such as telling when the user is gazing at a particular button or reading across a line of text. Good performance at the application level depends critically on good fixation detection.
Fixation detection depends on defining thresholds to select the gaze points representing a fixation within a compact spatial-temporal region. For example, in dispersion-based fixation detection, a fixation may be defined as a set of consecutive gaze points that span a time interval longer than some minimum threshold (e.g., 100 milliseconds) and have a spatial deviation that is less than a selected spatial threshold.
Complicating the detection of eye fixations is the continual presence of small eye motions called microsaccades as well as system noise. Microsaccades represent small deviations in the point of focus around an otherwise specific gaze location. The specific implementations of eye-gaze tracking systems also introduce some level of noise into the data, as the motion of the eyes, video acquisition, and the calibration limits make precise gaze tracking impossible. Additive random noise is especially amplified in remote eye trackers where the camera is placed near the monitor and thus usually a few feet from the user. These factors may vary between system implementations, users and the content being viewed, making it difficult to select a universally applicable set of spatial thresholds.
Current fixation detection methods may use fixed thresholds that may be manually set by an eye tracking operator or analyst. For example, using a dispersion threshold of ½ to 1 degree of visual angle of the user's eye has been recommended. Known commercial systems recommend a threshold of 50 pixels if the user is looking at pictures, 20 pixels if the user is reading, and 30 pixels for “mixed” content. Using fixed thresholds limits the generality of fixation detection and requires manual fine tuning. Thresholds that are too small may result in missed fixations, while thresholds that are too large may result in over-grouping fixations, which erroneously combines consecutive fixations together. Fixed thresholds may prevent implementation of a universal “plug and play” eye gaze tracker system, requiring a skilled operator to appropriately adjust fixation thresholds.
Gaze tracking data may be processed using a Gaussian filter, smoothing the data from the x and y channels. Noise reduction tends to smear fixations together, potentially blurring the boundary between fixations and saccades. The transitions from fixation to saccade may contain high frequency information that may be lost by this type of filtering process.
What is needed, therefore, is a gaze tracking system that can determine fixations in the gaze tracking data without assigning fixed thresholds.