Human computer interaction has been revolutionized by the introduction of the graphical user interface (GUI). Thereby, an efficient means was provided for presenting information to a user with a bandwidth that immensely exceeded any prior channels. Over the years the speed at which information can be presented has increased further through colour screens, enlarged displays, intelligent graphical objects (e.g. pop-up windows), window tabs, menus, toolbars, etc. During this time, however, the input devices have remained essentially unchanged, i.e. the keyboard and the pointing device (e.g. the mouse, track ball or touch pad). In recent years, handwriting devices have been introduced (e.g. in the form of a stylus or graphical pen). Nevertheless, while output bandwidth has multiplied several times, the input bandwidth has been substantially unchanged. Consequently, a severe asymmetry in the communication bandwidth in the human computer interaction has developed.
In order to decrease this bandwidth asymmetry as well as to improve and facilitate the user interaction, various attempts have been made to use eye-tracking for such purposes. Monitoring or tracking eye movements and detecting a person's gaze point (as used herein, the point in space at which the person is looking) can be an important information source in analysing the behaviour or consciousness of the person. It can be used both for evaluating the object at which the person is looking and for evaluating the respective person. By implementing an eye tracking device in e.g. a laptop, the interaction possibilities between the user and the different software applications run on the computer can be significantly enhanced.
Hence, one interesting idea for improving and facilitating the user interaction and for removing the bandwidth asymmetry is to use eye gaze tracking instead or as a complement to mouse input. Normally, the cursor is positioned on the display according to the calculated point of gaze of the user. A number of different techniques have been developed to select and activate a target object in these systems. In one example, the system activates an object upon detection that the user fixates his or her gaze at a certain object for a certain period of time. Another approach is to detect an activation of an object when the user's eye blinks.
However, there are problems associated with these solutions using eye tracking. For example, the humans use their eye in perceptive actions instead of controlling. Therefore, it may be stressful to carefully use eye movements to interact with a computer, for example, to activate and select an object presented on the display of the computer. It may also be difficult to control blinking or staring in order to interact with objects presented on a display.
Thus, there is a need within the art for improved techniques that enable user interaction with a computer provided with an eye tracking device allowing the user to control, select and activate objects and parts of objects presented on a display of the computer using his or her eyes in a more intuitive and natural way. Furthermore, there is also a need within the art for techniques that in a more efficient way takes advantage the potential of using eye tracking for improving and facilitating the user interaction with a computer.
One such attempt is presented in US pat. Appl. No. 2005/0243054 to Beymer et al. in which a technology for selecting and activating a target object using a combination of eye gaze and key presses is disclosed. More specifically, a user looks at a target object, for example, a button on a graphical user interface and then presses a selection key of the keyboard. Once the selection key is pressed, a most probable target is determined using probability reasoning. The determined target object is then highlighted and the user can select it by pressing the selection key again. If the highlighted object is not the target object, the user can select another target object using additional keys to navigate to the intended target object.
However, this technology is limited to object selection and activation based on a combination of eye gaze and two sequential presses of one dedicated selection key.
Consequently, there still remains a need within the art of an improved technique that in a more efficient way takes advantage of the potential in using eye tracking for improving and facilitating the user interaction with a computer and in particular user interaction with graphical user interfaces.
An object of the present invention is to provide improved methods and systems for assisting a user when interacting with a graphical user interface by combining eye based input with other input, e.g. mechanical input, input from an IR-sensor, voice activated input, detection of body gestures or proximity based input, for selection and activation of areas of a screen or display, objects and objects parts presented on the display and execution of contextual actions related to these areas, objects and object parts.
Another object of the present invention is to provide methods and systems for user friendly and intuitive interaction with graphical user interfaces.
In the context of the present invention, the term “GUI” (Graphical User Interface) refers to a graphics-based user interface with pictures or images and words (including e.g. signs and figures) on a display that incorporate, for example, movable windows and icons.
Further, in the context of the present invention the terms “object of interest” or “object part of interest” refer to an interactive graphical object or GUI object such as a button or a scroll bar hyperlink, or non-interactive objects such as a text or a word in a text that the user desires to select or activate through an eye gaze.
The term “contextual action” refers, in the context of the present invention, to an action than can be executed with respect to an object or object part based on eye data input and input from e.g. mechanical input devices such as a mouse or keys or buttons, input from an IR-sensor, voice activated input, or detection of body gestures or proximity based input. For example, the user gazes at a certain window displayed on the display and presses a certain key may result in the contextual action that the certain window is maximized. Another example is that when a user gazes at a web-link in a window and makes a certain gesture with her hand, the linked web page is opened.
According to an aspect of the present invention, there is provided a method for manipulating objects or parts of objects and performing contextual actions related to the objects presented on a display of a computer device associated with an eye tracking system. The method comprises displaying objects on the display of the computer device and providing an eye-tracking data signal describing a user's gaze point on the display and/or relatively the display. Activation input may be received from an input device (e.g. pressing a key of a keyboard or pressing a joystick button associated with the computer device. Further, activation input may be received from an IR-sensor, or may be voice activated input, or may be detection of body gestures or proximity based input. Thereafter, an object or a part of an object on which the user is gazing is determined by using the determined gaze point and/or the activation input. The object or object part is determined to be an object or object part of interest if current gaze conditions fulfil predetermined gaze conditions and/or the activation input fulfils predetermined conditions. A specific contextual action is determined based on the received activation input and the object or object part of interest. Finally, the specific contextual action is executed.
According to second aspect of the present invention, there is provided a system for assisting a user in manipulating objects or parts of objects and performing contextual actions related to the objects presented on a display of a computer device associated with an eye tracking system. The system comprises a display adapted to display objects. An input module is adapted to receive activation input from, for example, at least one key of a keyboard associated with the computer device, or a foot pedal, mechanical switch, joystick button, gamepad etc. Alternatively, the activation input may be received from e.g. an IR-sensor, or may be voice activated input, or may be detection of body gestures or proximity based input. Further, an object identifier is adapted to receive an eye-tracking data signal describing a user's gaze point on the display and/or relatively the display, to identify an object or a part of an object on which the user is gazing at using the determined gaze point and/or the activation input, and to determine the object to be an object of interest if current gaze conditions fulfil predetermined gaze conditions and/or the activation input fulfils predetermined conditions. An action determining module is adapted to determine a specific contextual action based on the received activation input and the object or object part of interest, and providing instructions for execution of the specific contextual action.
The present invention offers several advantages over known techniques. For example, the user can select and activate objects and execute contextual actions related to these objects in a user friendly, reliable and accurate way due to the intuitive way of function of the present invention. Commands and execution of contextual actions that traditionally requires a sequence of hand and/or finger manipulations can now efficiently and effortlessly be effected based on the user's eye activity and customized input. This is of great use and interest for ordinary computer users, for example, at work or at home. Furthermore, this is also desirable in a broad range of more specific applications such as, for example, support operators in a call-center environment (e.g. when entering/editing data in a customer relationship management application) and users of advanced computer aided design (CAD) tools. The invention may also be useful to improve the ergonomics and reduce the risk of e.g. repetitive strain injuries.
Moreover, because, according to a preferred embodiment, the user can define or configure which specific actions that should result from a specific combination of eye data input (e.g. selection of a certain object and detection of a dwell time of the gaze) and input (e.g. press of a specific key), a very user friendly and intuitive interaction environment based on that user's preferences and requirements can be created.
According to a further aspect of the present invention, eye gaze data and input in combination is used to enable a user to select, zoom and activate objects and object parts of interest. The user can magnify or enlarge an object or object part of interest or an area around a gaze point by gazing at the object or object part or at an area on a screen or display and delivering input, e.g. pressing a certain key of the keyboard. During a maintained mechanical input signal, e.g. maintained pressure on the key, the object or object part is gradually enlarged, and, thus, a zooming effect is achieved. By delivering a second mechanical input signal, e.g. by releasing the press of the key, the user may manipulate, click or activate the magnified object or object part. The user may adjust the gaze if necessary to adjust for e.g. inaccuracy of the eye tracker. The enlargement of the object or object part can be enlarged enough to cater for the average inaccuracy or offset error of the eye tracker.
In one example, the most probable object or object part can be zoomed or enlarged and centered on the determined gaze point. If it is the correct object or object part, the user may activate the object or object part by delivering the second input, e.g. by releasing the press on the key. To assist the user, a visual cue can be shown indicating which object or object part the user gaze rest upon. Alternatively, the determined gaze point can be shown to indicate for the user which object or object part that will be subjected to the contextual action will be performed after the enlargement (or zooming action), for example, where a click will be performed.
Further objects and advantages of the present invention will be discussed below by means of exemplifying embodiments.