1. Field of the Invention
The invention relates generally to user interfaces for interacting with objects in a display and more particularly to user interfaces in which eye movement is used to control a pointing device and thereby to interact with the objects in the display.
2. Description of Related Technology
The following Description of related technology first describes techniques that are currently used to track eye movements and then provides examples of how eye movement has been used to interact with objects in a display.
Eye Tracking Apparatus: FIG. 1
Studies that have involved eye and gaze tracking have been carried out since the second half of the 19th century. The techniques used to track eye movements were revolutionized by the development of digital computers. Personal computers have now become fast enough to do digital video analysis of eye movements in real time. The most commonly used approach in video-based eye tracking is to calculate the angle of the visual axis (and the location of the fixation point on the display surface) by tracking the relative position of the pupil and a speck of light reflected from the cornea, technically known as the “glint”. FIG. 1 shows how this is done. The gaze direction is calculated by comparing the relative position and relationship between the pupil 103 and corneal reflection—the glint 107. Infra-red illumination of the eye produces the ‘bright pupil’ effect 105 and makes the tracking easier.
A typical, and portable, eye tracking system similar to ones that are commercially available is shown at 109. System 109 is a laptop computer 111 to which two infrared illuminators and a video camera have been added. Shown on screen 117 is the picture of the eye made by camera 117 with crosshairs marking the positions of the pupil and the glint as determined by the digital video analysis of the picture of the eye. Further information about equipment for tracking eye movement may be found in the Eye Movement Equipment Database, available on the World Wide Web at ibs.derby.ac.uk/emed/.
It is becoming possible to build eye trackers whose prices are comparable to the price of a new personal computer. Most commercially available eye tracking systems (including the high-end ones) have two characteristics that make them less than ideal for many applications. These are:                the system has to be calibrated for each individual user;        even remote eye trackers have very low tolerance for head movements and require the viewer to hold the head unnaturally still, or to use external support like head- or chin rests.        
The solution lies in the development of software that would be able to perform eye tracking data analysis in more natural viewing circumstances. A recent report by Quiang and Zhiwei, “Eye and Gaze Tracking for Interactive Graphic Display”, International Symposium on Smart Graphics, Jun. 11-13, 2002, Hawthorne, N.Y., (2002) seems to be a step in the right direction. Instead of using conventional approaches to gaze calibration, they introduce a procedure based on neural networks that incorporates natural head movements
Pointers and Pointing Devices in Graphical User Interfaces
Pointers are essential components of modern graphical user interfaces (GUIs). A pointer is a graphic such as an arrowhead that indicates a current position on an interactive device's display. A pointing device is the device that a user of the interactive device uses to move the pointer and to interact with the objects in the display. A pointing device may be any device which translates a movement made by a user of the pointing device into a movement of the pointer and/or an indication of an operation to be performed on the object. The pointing device generally has two parts: tracking hardware which maps some movement of the user onto positions in the display and indicates the current position of a switch on the tracking hardware, and software which is particular to the application which is receiving the hardware inputs and interprets the current display and switch positions as required by the application. Pointing devices in current use include the mouse, the trackball, the stylus, a touch-sensitive area on the keyboard, the joystick, including a miniature joystick built into a keyboard, and a touch-sensitive surface over the display. The mouse provides an example of how pointing devices generally work. Objects in the display include icons representing entities such as documents. To view a document, the user causes the pointing device to move the pointer until it is over the icon that represents the document. Then the user performs an action which indicates that the object represented by the icon is to be opened. In the case of the standard mouse, that action is a double click with the left-hand mouse button. In response to the double click, the interactive system causes a word processing program to be executed which opens the document. As can be seen from the foregoing, the pointing device operates in two modes: a navigational mode, in which it moves the pointer to an object of interest, and an operational mode, in which it performs an action on the object of interest, in this case opening the document. With the mouse and with most other pointing devices, the user uses a button on the pointing device to switch between the navigational and operational modes.
Other common operations on objects are dragging the object, which is done by depressing the left-hand button of the mouse and moving the mouse, which causes the object to move as indicated by the mouse, and dropping the object, which is done by ceasing to depress the left-hand button when an object is being dragged. Other operations are of course possible. One example is the throwing operation described in U.S. Ser. No. 09/096,950, Milekic, User interface for removing an object from a display, filed Jun. 12, 1998. Throwing is an extension of the operation of dragging an object. As long as the speed of dragging remains within a certain limit one can move an object anywhere on the screen and drop it at desired location. However, if the speed of the motion increases above a threshold, the object flies off the display (most often, to be replaced by another object).
Pointing Devices that Employ Eye Movement to Move the Pointer
In the mid-1980's, researchers began experimenting with pointing devices that employed eye movement to control the pointer and interact with the display. The focus was mostly on users with special needs. Promoted by rapid technological advancements, this trend continued and in the past decade a substantial amount of effort and money has been devoted to the development of eye- and gaze-tracking mechanisms for human-computer interaction. Such pointing devices can be made using eye movement tracking devices such as the one shown at 109 in FIG. 1. When used with pointing devices, modern eye trackers map a current gaze direction that falls within a display to a cursor location in the display. Depending on the hardware, the tracker updates the current cursor location 30-200 times a second. The stream of current cursor locations is provided to software, which interprets the movements of the cursor. In the following, the generic term eye movement information is used to indicate the stream of cursor locations or any other information received from the eye tracker which can be used to determine eye movements.
For details of the experiments with pointing devices that employ eye movement, see Vertegaal, R. “The GAZE groupware system: mediating joint attention in multiparty communication and collaboration, in Proceedings of the ACM CHI'99 Human Factors in Computing Systems, ACM Press, New York, 1999, pp 294-301; Jacob, R. J. K. “Eye-movement-based human-computer interaction techniques: Toward non-command interfaces”, in H. R. Hartson & D. Hix, (eds.) Advances in Human-Computer Interaction, Vol. 4, pp 151-190, Ablex Publishing Corporation, Norwood, N.J., 1993; or Zhai, S., Morimoto, C., Ihde, S. “Manual and Gaze Input Cascaded (MAGIC) Pointing”, Proceedings of the CHI'87, ACM New York, 1999, pp. 246-253
Problems with Using Eye Movements to Control a Pointing Device
The biggest problem with using eye movements to control a pointing device is switching between modes. One aspect of this problem is that the eye movements occur not only when the user wishes to control the pointing device, but also when the user is simply looking at the display. The pointing device must thus be able to distinguish between observational eye movements, which occur when the user is just looking at the display, and intentional eye movements, which occur when the user wants to perform an operation in the display. A pointing device that is controlled by eye movements must thus distinguish between three modes of operation: the observational mode, in which the user is simply observing the display, in addition to the navigational and operational modes.
The mode problem is exacerbated by the fact that it is not immediately obvious how an eye movement can be interpreted as causing a shift from one to the other of the modes. Put another way, there are no buttons on a pointing device that is controlled by eye movements, and consequently, one can't indicate a mode switch by pushing a button. In the following, the problem of indicating a mode switch in a pointing device that is controlled by eye movements will be termed the “switch” problem. In the literature concerning pointing devices controlled by eye movements, the problem is known as the “Midas touch” or the “clutch” problem. The problem has been addressed numerous times in the literature and there are many proposed technical solutions. Only a few illustrative examples will be presented here.
One of the solutions to the switch problem, developed by Risø National Research Laboratory, was to separate a first area of the display in which the pointing device was always in operational mode from a second area of the display that contained the observed object. When the user looked at the first area, the result was the performance of the operation of putting the pointing device in operational mode when the user looked at the observed object. The first area thus served as a mode switch for the second area. In the following, an area of a display in which a pointing device may be switched into operational mode is termed a gaze sensitive area. An example of this technique is shown at 201 in FIG. 2. The area that is always gaze sensitive is button 205, termed an “EyeCon” button. The object that may become gaze sensitive is drawing 203. When the user focuses on button 205 (ordinarily for half a second), the button ‘acknowledges’ the viewer's intent to interact with object 203 by going through the animated sequence shown at 207. The completely closed eye indicates that the observed object is now gaze sensitive. For details on this approach, see Glenstrup, A. J., Engell-Nielsen, T. Eye Controlled Media: Present and Future State. Minor Subject Thesis, DIKU, University of Copenhagen, 1995, available at: http://www.diku.dk/˜panic/eyegaze/article.html#contents
One of the problems with this technique comes from the solution itself—the solution separates selection and action. In order to make object 203 gaze sensitive, one has to stop looking at the object and look at EyeCon button 205. Another problem is the interruption of the flow of interaction—in order to make object 203 gaze sensitive, the user has to focus on the action button for a period of time. This undermines the unique quality of gaze direction as the fastest and most natural way of pointing and selection. Another solution to the same problem (with very promising results) was to use inputs other than eye movements for the switch: the voice, as described in Glenn III, F. A., Iavecchia, H. P., Ross, L. V., Stokes, J. M., Weiland, W. J., Weiss, D., Zakland, A. L. Eye-voice-controlled interface, Proceedings of the Human Factors Society, 1986, pp. 322-326 or manual input, as described above in the Zhai reference. The difficulty here, of course, is that there may be applications in which such separate channels for the switch are not available to the user.
The second major problem with the use of eye movements to interact with objects in a display is the sheer volume of data collected during eye tracking and the effort involved in doing meaningful analysis of the data. Individual fixations of the eyes on an object carry very little meaning on their own. Consequently, a wide range of eye tracking metrics has been developed in past 50 years. An excellent and very detailed overview of these metrics can be found in Jacob, R. J. K., Karn, K. S. “Eye Tracking in Human-Computer Interaction and Usability Research: Ready to Deliver the Promises (Section Commentary)”, in The Mind's Eyes: Cognitive and Applied Aspects of Eye Movements, J. Hyona, R. Radach, H. Deubel (Eds.), Oxford, Elsevier Science, 2003. Here, we will mention only a few metrics that may be used to infer the viewer's interest or intent:                number of fixations: a fixation is the brief period of time during which the eye does not move. Concentration of a large number of fixations in a certain area may be related to user's interest in object or detail presented in that area when viewing a scene (or a painting).        gaze duration: gaze is defined as a number of consecutive fixations in an area of interest. Gaze duration is the total of fixation durations in a particular area.        number of gazes: is probably a more meaningful metric then the number of fixations. Combined with gaze duration, it may be indicative of viewer's interest.        scan path: scan path is a line connecting consecutive fixations. It can be revealing of a viewer's visual exploration strategies and is often very different in experts and novices.        
As is apparent from the foregoing, using eye movements to interact with an object requires good solutions to the switch problem and to the problem of what metrics to use in measuring and analyzing eye movement. It is an object of the invention disclosed herein to provide techniques for using eye movements to interact with objects which offer good solutions to those problems. It is further an object of the invention disclosed herein to provide improved techniques for solving problems similar to the switch problem in other pointing devices that do not include buttons.