1. Field of Invention
The present invention relates generally to the field of computer user interface technology. More specifically, the present invention is related to a system and method for the recognition of reading, skimming, and scanning from eye-gaze patterns.
The following definitions may assist in the understanding of the terminology used through the specification:                heterogeneous content—objects (like icons, windows, menus, etc.) encountered in electronic displays (e.g., monitors).        reading—a method of systematically and methodically examining and grasping the meaning of textual content.        skimming—a method of rapidly moving the eyes over textual content with the purpose of getting only the main ideas and a general overview of the content. scanning—a method of rapidly covering a great breadth of the display in order to locate specific heterogeneous content.        tokenization—the process of classifying a range of phenomena (i.e. eye movements) into discrete categories.        quantization—integration (usually averaging) of a sequential group of measurements where the measurements in each group do not overlap. The measurements may be over time or space.        database—any stored collection of information located on the local computer, a local area network (LAN) or and wide area network (WAN) including the world wide web (WWW) (note: any use of this term refers to the use of the term as defined in this way.)        
2. Discussion of Prior Art
Computers are a widely used resource in today's society. In most systems, a user manipulates a keyboard or a mouse to communicate with a computer. Modern systems include a graphical user interface (GUI) which communicates with the user by displaying various heterogenous content. In the context of this patent application, heterogeneous content includes objects normally encountered on computer monitors. For example, as illustrated in FIG. 1, heterogeneous content 100 includes (but is not restricted to) any of, or a combination of: text 102, images 104, hyperlinks 106, windows 108, icons 110, or menus 112. When users view a computer monitor with hetergeneous content displayed on its screen, they utilize an input device, such as a mouse or a keyboard, to manipulate one (or a combination of) heterogenous content items based on their interests. FIG. 2 illustrates a prior art system which comprises monitor 200, computer CPU unit 202, mouse 204, and keyboard 206. Users view on the computer monitor 200 various hetergeneous content items (like A, B, and C) and, based on their interest, they interact with one or more or a combination of heterogenous content items via mouse 204 or keyboard 206. This step is very “user driven” since the system does not have a means for dynamically tracking user interests (whether they are interested in A, B, or C) regarding the displayed heterogenous content and hence the computer wants for the user to respond via input device before proceeding with any action.
Thus, there is a need for a system that can dynamically and accurately determine what heterogenous content a user is interested in and the relative level of interest. One way of determining this relative interest level is by detecting what area of the display the user holds eye movement to a minimum ( e.g., maintains a gaze). Yet another related way involves determining user interests by detecting (from eye-movement patterns or eye-gaze patterns) which part of the heterogenous display screen was read by the user.
Detecting when a user is reading rather than merely scanning or skimming from eye-gaze patterns is a difficult problem, as low-level eye movements are almost completely automatic (i.e., involuntary). Thus, low-level eye movements do-not follow the assumed pattern of right->right->right during reading but instead follow much more complex patterns.
FIG. 3 illustrates some of the common eye movements observed during, reading. Common eye movement behaviors observed in reading 300 include forward saccades (or jumps) 302 of various length (eye-movements to the right), micro-saccades (small movements in various directions) 304, fixations of various duration 306, regressions (eye-movements to the left) 308, jitters (shaky movements) 310, and nystagmus (a rapid, involuntary, oscillatory motion of the eyeball) 312. As illustrated by FIG. 4, these behaviors in turn depend on several factors 400, some of which include (but are not restricted to): text difficulty 402, word length 404, word frequency 406, font size 408, font color 410, distortion 412, user distance to display 414, and individual differences 416. Individual differences that affect eye-movements further include, but are not limited to, reading speed 418, intelligence 420, age 422, and language skills 424. For example, as the text becomes more difficult to comprehend, fixation duration increases (as described by Just & Carpenter in their paper entitled, A theory of reading: From are fixations to comprehension, Psychological Review, 1980) and the number of regressions increases (as described by Rayner & Frazier in their paper entitled, Parsing temporarily ambigeous complements. Quarterly Journal of Experimental Psychology, 1987.) Given the complexity of eye-gaze patterns and the detailed information about the text and the individual required to predict these patterns, there have been no attempts to build a system to recognize until now.
Recent work in intelligent user interfaces has focused on making computers similar to an assistant or butler in supposing that the computer should be attentive to what the user is doing and should keep track of user interests and needs. Because the Microsoft Windows® operating system and other windows-based operating systems are ubiquitous and visually intensive, researchers have identified eye-gaze as a valuable way to determine user interest when interacting with most computer terminals. An effort to capitalize on eye-gaze as a measure of user interest was made in U.S. Pat. No. 5,886,683, which describes a method and apparatus for providing relevant information based on eye-gaze. In this case, interest in some display object (icon, image, or block of text) was determined based on a fixation threshold. Simply put, if the user looks at an object on the screen long enough, the system infers that the user is interested in that object. This same rule also applies to blocks of text. But, there is a need to determine different levels of user interest based on the type of user behavior, such as reading (high interest), skimming (medium), or scanning (low interest) as well as capturing the exact words on the screen that are involved.
Other researchers have been concerned more specifically with making sense out of complex, low level eye movement data. As noted, the eye is constantly moving. Even when one seems to be looking steadily at some object, the eye still makes micro-saccades (small movements), jitters (shaky movements), and nystagmus (compensatory movements to head motion). To provide eye movement data that is closer to what users experience, researchers have attempted to break down or filter complex raw eye movement data into a set of tokens. Work on fixation recognition that has formed the core of this research area was originally proposed by Jacobs in his paper entitled, Eye movement-based human-computer interaction techniques: Toward non-command interfaces, Advances in Human-Computer Interaction, 1990; and later in his paper entitled, What you look at is what you get: Eye movement-based interaction techniques, Proceedings ACM CHI'90 Human Factors in Computer Systems, 1990. The term “fixation” refers to an area of relatively stable gaze that lasts between 30 and 800 milliseconds. Although people are not aware of micro-saccades, they do report areas of fixation. Thus, fixation recognition is an attempt to determine where a user intended to look. Jacob's fixation recognition algorithm works by taking a 100 millisecond set of data (6 data points for this implementation) and if the points are all within 0.5 degrees of visual angle, then a fixation is said to be detected and located at the average point. The fixation continues as long as the gaze points stay within 1.0 degree of this average fixation point.
Obviously, the goal of Jacob's method is far different from that of the present invention's goal of recognizing reading. Let us assume that his method for fixation recognition is used by a simple algorithm for reading detection. For instance, suppose a series of say three fixations to the right, fixation->fixation->fixation, signal that reading is detected. Several problems occur when using this method for reading detection: (a) loss of information, (b) regressions, (c) eye movement on the Y axis, (d) resets to beginning of next line, (e) revisits to previous sentences.
Whatever the precise merits, features and advantages of the above cited references, none of them achieve or fulfills the purposes of the present invention.