The present invention relates generally to the field of eye tracking and methods for processing eye tracking data. In particular, the invention relates to a system and method for determining mental states or mental activities of a person from spatio-temporal eye-tracking data, independent of a priori knowledge of the objects in the person""s visual field.
In recent years, eye-tracking devices have made it possible for machines to automatically observe and record detailed eye movements. One common type of eye tracker, for example, uses an infrared light-source, a camera, and a data processor to measure eye gaze positions, i.e., positions in the visual field at which the eye gaze is directed. The tracker generates a continuous stream of spatiotemporal data representative of eye gaze positions at sequential moments in time. Analysis of this raw data typically reveals a series of eye fixations separated by sudden jumps between fixations, called saccades.
An informative survey of the current state of the art in the eyetracking field is given in Jacob, R. J. K., xe2x80x9cEye tracking in advanced interface designxe2x80x9d, in W. Barfield and T. Furness (eds.), Advanced interface design and virtual environments, Oxford University Press, Oxford, 1995. In this article, Jacob describes techniques for recognizing fixations and saccades from the raw eye tracker data. Fixation and saccade data alone, however, is still relatively low-level data that is of limited use, and Jacob fails to teach any specific methods for recognizing a user""s conscious intentions or mental states. These eye tracking methods, therefore, still fall short of the goal of providing useful information about any higher-level eye behavior or mental states.
One attempt to derive higher-level cognitive information from eye movement data is described by India Starker and Richard A. Bolt in xe2x80x9cA gaze-responsive self-disclosing displayxe2x80x9d, CHI ""90 Proceedings, April 1990. Their technique correlates eye fixation data with a priori knowledge of objects in the user""s field of view (i.e., on the computer screen) to make the inferences about the degree of interest the user has in each object. One major disadvantage of this technique is that it requires a priori knowledge of the objects in the user""s visual field, such as their positions, shapes and type information. Consequently, the technique cannot be used in many computer software applications where information about what is displayed on a computer screen is not readily available. In addition, it cannot be used in other situations where a priori knowledge is not available at all, such as when the user is not viewing virtual objects on a computer screen, but physical objects in the real world.
In addition, because the technique disclosed by Starker and Bolt identifies the attention of the user with single fixation points, it fails to accurately distinguish attentively looking at an object from xe2x80x9cspacing outxe2x80x9d while inattentively gazing at the object. Thus, although the technique attempts to recognize the mental state of attentive interest, it actually fails to properly distinguish this state from non-attentiveness. It will also be noted that Starker and Bolt propose a technique that is limited to identifying just one cognitive state.
Another technique for using eye-movement data is disclosed by Hironobu Takagi in xe2x80x9cDevelopment of Predictive CHI with Eye Movements,xe2x80x9d Master""s Thesis, University of Tokyo, Feb. 7, 1996. As stated in the Abstract, Takagi xe2x80x9cdeveloped algorithms to extract users"" intention and knowledge states from eye-movementsxe2x80x9d (Takagi, p. 1). Takagi, however, does not disclose any general method for extracting a user""s intention from eye movements. Because detailed a priori knowledge of the user task is thought to be required in order to infer user intentions, Takagi only teaches a method that is limited to a very specific task or domain of application. As Takagi states, xe2x80x9cAny general methods of analysis derived from known theories cannot be developed. Therefore, we must develop analysis methods for each domain taskxe2x80x9d (Takagi, pp. 13-14). In other words, Takagi not only fails to teach a general method of extracting a user""s intention from eye movement data, he also states that such a general method is impossible using known theories.
Takagi""s techniques are also limited by the fact that they require a combination of eye movement data with information about the objects being viewed by the user. In order to extract information about a user""s intentions, Takagi measures eye movement data and combines it with a priori knowledge about the contents of the user""s field of vision, i.e., the contents of the computer display. Because predetermined regions of the screen are known to contain objects with specific meaning, the eye movement data can be correlated with these regions and interpreted. Two of Takagi""s algorithms, for example, assume the screen is divided into rectangular regions termed xe2x80x9ccolumnsxe2x80x9d, then correlates eye movements to these specific columns (Takagi, p. 31-32). Thus, the technique xe2x80x9canalyzed data concerning regions that divide stimuli. Eye movements were not transformed into fixation-saccade data. This is a weak point of the method. We cannot transform eye-movements data into fixation-saccade data because of some problemsxe2x80x9d (Takagi, p. 45). Thus, not only does Takagi require a priori knowledge of the content of specific regions in user""s visual field, but Takagi""s method only measures the region within which the user is gazing, and does not measure detailed fixation-saccade data. Moreover, Takagi proposes xe2x80x9cto analyze long term eye movements statisticallyxe2x80x9d (Takagi, p. 31). These statistical methods are performed xe2x80x9cwith disregard for details of eye movementsxe2x80x9d (Takagi, p. 28). Such statistical methods, in other words, ignore the detailed spatiotemporal trajectories of eye movements and consider only statistical features of the movements within coarsely defined regions that must be known a priori by Takagi""s system.
Takagi""s technique is also limited in other important respects. For example, Takagi""s techniques depend on a prior knowledge of the tasks and xe2x80x9conly analyze periods when users carry out the main goal of the taskxe2x80x9d (Takagi, p. 45). Regarding the long-standing problem of correctly relating eye fixations with user attentions, Takagi acknowledges that his technique does xe2x80x9cnot deal with this problemxe2x80x9d (Takagi, p. 28). It is clear, therefore, that the prior art techniques for interpreting eye tracker data suffer from one or more of the following disadvantages: they fail to properly identify user attention or intention, they do not identify a variety of mental states, they are limited to very specific and predetermined user tasks, and they require a priori knowledge of objects in the user""s field of vision.
In view of the above, it is an object of the present invention to overcome the disadvantages and limitations of existing methods for deriving useful information from eye tracker data. In particular, it is an object of the present invention to provide a method for accurately recognizing a variety of high-level mental states of a user from eye tracker data. It is another object of the invention to provide such a technique that does not require a priori information about objects in the user""s visual field, and is not limited to situations where the user is looking at a computer screen. Yet another object of the invention is to provide a method for analyzing user mental states from detailed fixation-saccade data rather than from statistical data derived from eye movements. An additional object of the invention is to provide a technique for inferring mental states of a user without requiring a priori knowledge of the task the user is engaged in, or of the contents and locations of specific regions at which the user is looking.
These and other objects and advantages are provided by a computer-implemented method for inferring mental states of a person from eye movements of the person. The method includes identifying elementary features of eye tracker data, such as fixations, saccades, and smooth pursuit motion. Identifying a fixation typically includes identifying a fixation location and a fixation duration. Identifying a saccade typically involves identifying a beginning and end location of the eye-movement, as well as possibly determining the velocity and other characteristics of the movement. It will be noted that for many applications that do not consider the velocity of the saccade, identifying two successive fixations can be used to identify a saccade. Identifying smooth pursuit motion typically includes identifying the velocity and path the eye takes as it smoothly follows a moving object. The method also includes recognizing from the elementary features a plurality of eye-movement patterns, i.e., specific spatiotemporal patterns of fixations, saccades, and/or other elementary features derived from eye tracker data. Each eye-movement pattern is recognized by comparing the elementary features with a predetermined eye-movement pattern template. A given eye-movement pattern is recognized if the features satisfy a set of criteria associated with the template for that eye-movement pattern. The method further includes the step of recognizing from the eye-movement patterns a plurality of eye-behavior patterns corresponding to the mental states of the person.