In recent years, eye-tracking devices have made it possible for machines to automatically observe and record detailed eye movements. One common type of eye tracker, for example, uses an infrared light-source, a camera, and a data processor to measure eye gaze positions, i.e., positions in the visual field at which the eye gaze is directed. The tracker generates a continuous stream of spatiotemporal data representative of eye gaze positions at sequential moments in time. Analysis of this raw data typically reveals a series of eye fixations separated by sudden jumps between fixations, called saccades.
An informative survey of the current state of the art in the eyetracking field is given in Jacob, R. J. K., "Eye tracking in advanced interface design", in W. Barfield and T. Furness (eds.), Advanced interface design and virtual environments, Oxford University Press, Oxford, 1995. In this article, Jacob describes techniques for recognizing fixations and saccades from the raw eye tracker data. Fixation and saccade data alone, however, is still relatively low-level data that is of limited use, and Jacob fails to teach any specific methods for recognizing a user's conscious intentions or mental states. These eye tracking methods, therefore, still fall short of the goal of providing useful information about any higher-level eye behavior or mental states.
One attempt to derive higher-level cognitive information from eye movement data is described by India Starker and Richard A. Bolt in "A gaze-responsive self-disclosing display", CHI '90 Proceedings, April 1990. Their technique correlates eye fixation data with a priori knowledge of objects in the user's field of view (i.e., on the computer screen) to make inferences about the degree of interest the user has in each object. One major disadvantage of this technique is that it requires a priori knowledge of the objects in the user's visual field, such as their positions, shapes and type information. Consequently, the technique cannot be used in many computer software applications where information about what is displayed on a computer screen is not readily available. In addition, it cannot be used in other situations where a priori knowledge is not available at all, such as when the user is not viewing virtual objects on a computer screen, but physical objects in the real world.
In addition, because the technique disclosed by Starker and Bolt identifies the attention of the user with single fixation points, it fails to accurately distinguish attentively looking at an object from "spacing out" while inattentively gazing at the object. Thus, although the technique attempts to recognize the mental state of attentive interest, it actually fails to properly distinguish this state from non-attentiveness. It will also be noted that Starker and Bolt propose a technique that is limited to identifying just one cognitive state.
Another technique for using eye-movement data is disclosed by Hironobu Takagi in "Development of Predictive CHI with Eye Movements," Master's Thesis, University of Tokyo, Feb. 7, 1996. As stated in the Abstract, Takagi "developed algorithms to extract users' intention and knowledge states from eye-movements" (Takagi, p. 1). Takagi, however, does not disclose any general method for extracting a user's intention from eye movements. Because detailed a priori knowledge of the user task is thought to be required in order to infer user intentions, Takagi only teaches a method that is limited to a very specific task or domain of application. As Takagi states, "Any general methods of analysis derived from known theories cannot be developed. Therefore, we must develop analysis methods for each domain task" (Takagi, pp. 13-14). In other words, Takagi not only fails to teach a general method of extracting a user's intention from eye movement data, he also states that such a general method is impossible using known theories.
Takagi's techniques are also limited by the fact that they require a combination of eye movement data with information about the objects being viewed by the user. In order to extract information about a user's intentions, Takagi measures eye movement data and combines it with a priori knowledge about the contents of the user's field of vision, i.e., the contents of the computer display. Because predetermined regions of the screen are known to contain objects with specific meaning, the eye movement data can be correlated with these regions and interpreted. Two of Takagi's algorithms, for example, assume the screen is divided into rectangular regions termed "columns", then correlates eye movements to these specific columns (Takagi, p. 31-32). Thus, the technique "analyzed data concerning regions that divide stimuli. Eye movements were not transformed into fixation-saccade data. This is a weak point of the method. We cannot transform eye-movements data into fixation-saccade data because of some problems" (Takagi, p. 45). Thus, not only does Takagi require a priori knowledge of the content of specific regions in user's visual field, but Takagi's method only measures the region within which the user is gazing, and does not measure detailed fixation-saccade data. Moreover, Takagi proposes "to analyze long term eye movements statistically" (Takagi, p. 31). These statistical methods are performed "with disregard for details of eye movements" (Takagi, p. 28). Such statistical methods, in other words, ignore the detailed spatiotemporal trajectories of eye movements and consider only statistical features of the movements within coarsely defined regions that must be known a priori by Takagi's system.
Takagi's technique is also limited in other important respects. For example, Takagi's techniques depend on a priori knowledge of the tasks and "only analyze periods when users carry out the main goal of the task" (Takagi, p. 45). Regarding the long-standing problem of correctly relating eye fixations with user attentions, Takagi acknowledges that his technique does "not deal with this problem" (Takagi, p. 28). It is clear, therefore, that the prior art techniques for interpreting eye tracker data suffer from one or more of the following disadvantages: they fail to properly identify user attention or intention, they do not identify a variety of mental states, they are limited to very specific and predetermined user tasks, and they require a priori knowledge of objects in the user's field of vision.