The present invention relates to the field of human-computer interaction (HCI) focusing on using a pointer for selecting objects and, more particularly, to a method and system for implicitly resolving pointing ambiguities in human-computer interaction by implicitly analyzing user movements of a pointer toward a user targeted object located in an ambiguous multiple object domain and predicting the user targeted object. Implicitly analyzing the user movements of the pointer and predicting the user targeted object are done by using different categories of heuristic (statically and/or dynamically learned) measures, such as (i) implicit user pointing gesture measures, (ii) application context, and (iii) number of computer suggestions of each best predicted user targeted object.
In the continuously developing field of human-computer interaction (HCI), human-computer interfaces are constantly getting closer to the cognitive level of the user. Contemporary interfaces try to make the human-computer dialogue as natural and non-formal as possible, thus taking advantage of the user's natural perceptual and communicative abilities.
The last decades have witnessed dramatic changes in this respect, with the introduction of new graphical user interface (GUI) concepts and devices such as graphical object manipulation, ‘mouse’ pointing devices and associated pointers, windows, pull-down dynamic menus, and icons, all of which are now ubiquitous. These concepts were the foundations for the ‘WYSIWYG’ (what you see is what you get) paradigm originated during the 1970's at Xerox PARC laboratories, and were later used extensively in commercial systems such as the Xerox Star (1981), the Apple Lisa (1982), the Apple Macintosh (1984), and currently in Microsoft Windows. The theoretical principles and psychological foundation of these interfaces were eventually termed ‘Direct Manipulation’ by Shneiderman, as described by Shneiderman, B., in The Future of Interactive Systems and The Emergence of Direct Manipulation, Behavior and Information Technology 1, 1982, 237-256, and reviewed by Myers, B. A., in Brief History of Human Computer Interaction Technology, ACM Interactions, 5(2), March 1998, 44-54. The trend of approaching the user's cognitive domain in the field of human-computer interaction has continued since, and is now manifested in the form of the ‘Perceptual User Interfaces’ paradigm as described by Turk, M. and Robertson, G., in Perceptual User Interfaces, Communications of the ACM, March 2000, 43(3), 33-34.
Direct Manipulation (DM) interfaces are characterized by continuous graphic representation of the system objects, physical actions instead of complex syntax, and incremental operations whose impact on the object is immediately visible. DM interfaces are generally shown to improve users' performance and satisfaction, and were cognitively analyzed as encouraging ‘Direct Engagement’ and reducing ‘Semantic Distance’, as described by Hutchins et. al., in Direct Manipulation Interfaces, User Centered System Design: New Perspectives on Human-computer Interaction, Norman, D. A. and Draper, S. W. (eds.), Lawrence Erlbaum, New Jersey, USA, 1986, 87-124. Direct Engagement refers to the ability of users to operate directly on objects rather than conversing about objects using a computer language. The ‘Cognitive Distance’ factor describes the amount of mental effort needed to translate from the cognitive domain of the user to the level required by the user interface, as described by Jacob, R. J. K., in A Specification Language for Direct Manipulation User Interfaces, ACM Transactions on Graphics, 5(4), 1986, 283-317. The graphic nature of DM interfaces, with their rich visual representation of the system's objects, exploits the natural perceptual bandwidth of humans. It enables presenting abundant and complex information to be processed simultaneously and naturally by users.
The intuitiveness of DM may also be attributed to the fact that it incorporates natural dialogue elements. For instance, the fact that both user's input and computer's output are performed on the same objects, also referred to as ‘inter-referential I/O’, is reminiscent of natural language, which often includes references to objects from earlier utterances, as described by Draper, S. W., in Display Managers as the Basis for User-Machine Communication, User Centered System Design: New Perspectives on Human-computer Interaction, Norman, D. A. and Draper, S. W. (eds.), Lawrence Erlbaum, NJ, USA, 1986, 339-352.
DM has some limitations, as not all interaction requirements can be reduced to graphical representation. An example of such limitation is the fact that DM is tailored for a demonstrative form of interaction, at the expense of neglecting descriptive interaction, that is, DM is suited for actions such as “delete this”, but not for actions such as “delete all red objects that reside next to blue objects”. This is a significant limitation of DM, as many human-computer interaction scenarios may benefit from descriptive operations. However, such limitations did not prevent DM from becoming a de facto standard in the last decades, as described by Buxton, B., in HCI and the Inadequacies of Direct Manipulation Systems, SIGCHI Bulletin, 25(1), 1993, 21-22.
Despite the dramatic advances of the last decades, user interface is still considered a bottleneck limiting successful interaction between human cognitive abilities and the computer's continually evolving and developing computational abilities. The goal of natural interfaces is being further pursued in contemporary research under the framework of ‘Perceptual User Interfaces’ (PUI). PUI seek to craft user interfaces that are more natural and compelling by taking advantage of the ways in which people naturally interact with each other. Human-computer interfaces are required, by the PUI paradigm, to be as transparent as possible, and to incorporate several modalities of input and output. Possible input methods include speech, facial expressions, eye movement, manual gestures and touch. Output may be given as a multimedia mix of visual, auditory and tactile feedback, as described in the above Turk and Robertson reference.
While there is consensus that user interfaces should become more ‘natural’ to the user, different concepts of ‘natural interfaces’ are being pursued. Many researchers focus on natural language interfaces. The main goal of this discipline is the understanding and deployment of spoken language as a means of entering input. These interfaces are often augmented by additional modalities like physical pointing, as described by Bolt, R. A., in Put-That-There: Voice and Gesture at the Graphics Interface, Computer Graphics, 14(3), 1980, 262-270, and described by Kobsa et al., in Combining Deitic Gestures and Natural Language for Referent Identification, Proceedings of the 11th International Conference on computational Linguistics, Bonn, West Germany, 1986, 356-361, or lip movement recognition to improve the interface's robustness.
Other researchers try to incorporate natural dialogue principles into non-verbal dialogue. In the above Buxton reference, several drawbacks of using spoken language as a suitable human-computer interface are presented. There, it is claimed that spoken languages are not actually natural (but, rather learned), are not universal (profound differences exist between languages), and are “single-threaded” (only one stream of words can be parsed at a time). Consequently, Buxton's approach is to rely on elements that are arguably more fundamental to natural dialogue such as fluidity, continuity, and phrasing, and integrate them into traditional graphical user interfaces. Similar approaches try to augment traditional GUI, which is said to consist of a syntactic, semantic, and lexical level, as described by Foley, J. D. et al. in Computer Graphics: Principles and Practice, Addison-Wesley, Reading, Mass., USA, 1990, with an additional discourse level. The discourse level refers to the ability of interpreting each user's action in the context of previous actions, rather than as an independent utterance. The interpretation is performed according to human dialogue properties such as conversational flow and focus, as described by Perez, M. A. and Sibert, J. L., in Focus in Graphical User Interfaces, Proceedings of the ACM International Workshop on Intelligent User Interfaces, Addison-Wesley/ACM Press, FL, USA, 1993, and by Jacob, R. J. K., in Natural Dialogue in Modes Other than Natural Language, Dialogue and Instruction, Reun, R. J., Baker, M., and Reiner, M. (eds.), Springer-Verlang, Berlin, 1995, 289-301.
Another aspect of the cognitive gap between humans and computers is the incompatible object representation. While computer programs usually maintain a rigid definition of their interaction objects, users are often interested in objects that emerge dynamically while working. This disparity is a major obstacle on the way to natural dialogue. The gap can be somewhat bridged using ‘Gestalt’ Perceptual Grouping principles that may identify emergent perceptual objects that are of interest to the user. Such an effort was made in the “Per Sketch” Perceptually Supported Sketch Editor, as described by Saund, E. and Moran, T. P., in A Perceptually Supported Sketch Editor, Proceedings of the ACM Symposium on UI software and Technology, UIST, CA, USA, 1994, and in Perceptual Organization in an Interactive Sketch Editing Application, ICCV, 1995.
Perceptual Grouping principles may also be used in order to automatically understand references to multiple objects without the user needing to enumerate each of them. For example, following a user's reference to multiple objects, either with a special mouse action or verbal utterance, a ‘Gestalt’ based algorithm may be used for identifying the most salient group of objects as the target of the pointing action, as described by Thorisson, K. R., in Simulated Perceptual Grouping: An Application to Human-Computer Interaction, Proceedings of the Sixteenth annual Conference of the Cognitive Science Society, Atlanta, Ga., USA, Aug. 13-16, 1994, 876-881.
One profound difference between Perceptual User Interfaces and standard Graphical User Interfaces is that while input to GUIs is atomic and certain, PUIs input is often uncertain and ambiguous and hence its interpretation is probabilistic. This fact presents a challenge of creating robust mechanisms for dealing with uncertainties involved during human-computer interactions. Typically, the strategy of meeting this challenge is by integrating information from several sources, as described by Oviatt, S. and Cohen, P., in Multidomal Interfaces That Process What Comes Naturally, Communications of the ACM, 43(3), March, 2000, 45-53.
The present invention for implicitly resolving pointing ambiguities in human-computer interaction, described below, falls within the Perceptual Interface paradigm with respect to both objective and methodology. An important continuously underlying objective of the present invention is to enable transparent human-computer interaction within the Direct Manipulation (DM) paradigm. Methodology of the present invention is based on heuristically dealing with uncertain input, as is often the case with Perceptual Interfaces.
Pointing. A fundamental element of any linguistic interaction is an agreement on the subject to which a sentence refers. Natural conversation is often accompanied by deictic (logic) gestures that help identify the object of interest by showing its location, as described by Streit, M., in Interaction of Speech, Deixix and Graphical Interface, Proceeding of the workshop on Deixis, Demonstration and Deictic Belief, held on occasion of Esslli XI, August, 1999. Deictic gestures performed in natural dialogue are very limited in information by themselves, as they only provide a general area of attention. However, these gestures are accurately understood in the conversation context, by integrating information from the spoken language and the gesture.
User interfaces need to incorporate an equivalent mechanism for determining the subject of user operations. DM interfaces implement this conversational element with the concept of the Current Object of Interest (COI), which is the designated object for the next user's actions. Many of the operations available to users of DM are designed to act upon the COI. The common method of designating the COI is by using a deictic gesture to point at the COI, that is, by clicking on the object. A typical interaction scenario consists of selecting the COI by pointing at it, and performing different actions on it, also referred to as the noun-verb or object-action paradigm.
The COI has a unique role in DM interfaces. One of the characteristics of DM interfaces is that they are modeless, whereby, each user's action is interpreted in the same manner, rather than according to a varying application ‘mode’. However, the COI mechanism is in fact a way of achieving modes in DM applications, as it reduces the acceptable inputs and determines the way inputs are interpreted, as described in the above Jacob, 1995, reference.
Pointing Ambiguities. Like their natural counterparts, user interface pointing gestures are limited in the information they convey. In scenarios where graphical representations are complex, pointing gestures are typically ambiguous. Simply clicking on the desired object is usually the most intuitive and common selection method. This method of selection is very precise in specifying the exact location of interest, but lacks any other information. In particular, in scenarios featuring complex graphical representations, the exact location information is not sufficient to determine the user targeted object, as several interaction objects may share the same location. In order to overcome this and other types of pointing ambiguities, currently used pointing techniques and mechanisms are extended in various ‘explicit’ ways, summarized herein below, each of which is at the expense of the desired invisibility of the interface.
Composite Objects Ambiguity. One of the problematic scenarios for target selection is dealing with hierarchical object situations in which some selectable objects are compounds of lower level selectable objects. In such cases, pointing is inherently ambiguous, since several interaction objects share any given pointing device (‘mouse’) click position. Furthermore, there is no area that is unique to any of the objects, which can serve as an unambiguous selection area. As shown in FIG. 1, a schematic diagram illustrating an example of the commonly occurring composite object type of pointing ambiguity, when a user clicks inside the circle A, the user may want to select either the inner slice B or the entire circle C. This particular type of pointing ambiguity is sometimes referred to as the “pars-pro-toto” ambiguity—mistaking the part for the whole, and vice versa, as described in the above Streit, 1999, reference.
The composite object ambiguity problem exists under the surface of many common human-computer interaction scenarios. For example, in a common word processor, system objects typically consist of letters, words, lines, and paragraphs, each being a possible current object of interest (COI). However, clicking the pointing device (mouse) when the pointer is in the area of a letter can be interpreted as pointing to any of the above system objects. As shown in FIG. 2, a schematic diagram illustrating an example of the commonly occurring composite object type of pointing ambiguity with respect to text objects, when a user clicks inside the area of the letter ‘U’ indicated by the pointer, the user may want to select the letter ‘U’, the word ‘User’, or the entire sentence including the indicated ‘U’. Frames shown in FIG. 2 represent the imaginary or potential selection area of each text object.
Currently used pointing techniques and mechanisms use or incorporate different explicit techniques to explicitly overcome or resolve composite object ambiguities. One currently used explicit technique is to ‘avoid the hierarchy’ by allowing access to only one level of hierarchy at a time. This solution is typical to vector graphics drawing software, whereby users may place graphical elements on a ‘canvas’ and manipulate them. Objects may be grouped together into a composite object, thus creating an object hierarchy. However, once grouped together, the elemental objects are not accessible unless explicitly ungrouped, in which case the composite object ceases to exist. In terms of the selection mechanism, only one type of object exists, and its selection is very straightforward. In such scenarios, additional, often undesirable, actions of grouping and/or ungrouping need to be performed. Moreover, a related significant limitation of the grouping/ungrouping technique is that the deeper or more extensive the grouping hierarchy becomes, the harder it is for users to select the elemental objects, as more ungrouping actions are required, as well as the need to reconstruct the hierarchy following successful completion of the ungrouping and selecting actions. An example of this approach is the MS WORD drawing objects grouping mechanism, shown in FIG. 3, a schematic diagram illustrating one currently used technique of ‘avoiding the hierarchy’ for explicitly overcoming or resolving composite object types of ambiguities shown in FIGS. 1 and 2. In FIG. 3, objects A and B are each individual or elemental selectable objects. Once grouped together, objects A and B are part of a single compound or composite object C. A user clicking anywhere within compound object C selects entire object C. The only way of selecting individual or elemental objects A and/or B is by first ungrouping compound or composite object C.
A second currently used explicit technique for overcoming or resolving composite object ambiguities is based on ‘having different modes of selection’. This technique enables a user to have direct access to any level of hierarchy, after setting the corresponding selection mode. This explicit technique is only applicable in cases where there is a limited number of hierarchy levels, and, if the hierarchy levels have an a priori meaning to the user. This technique is widely used in CAD applications, where users often need to define and use groups of objects. Again, the meaning of the grouping is that selecting one member of the group results in selecting all members of the entire group. The user may change operation of the selection mechanism properties to be either in a group mode or in a single object mode.
A third currently used explicit technique for overcoming or resolving composite object ambiguities is based on using different procedures or protocols for performing the selection action itself, referred to as ‘extended selection’ procedures. For instance, MSWord designates a double click action for selecting a word and a triple click action for selecting an entire paragraph. Another extended selection procedure is ‘click and drag’, whereby a user specifies an area of interest rather then a single point, resulting in either giving more information on a targeted object, or for enabling the user to simultaneously select a plurality of targeted objects.
Two additional currently used explicit types of object selection techniques are ‘Pose Matching’ and ‘Path Tracing’, presented as part of the above mentioned “Per Sketch” Perceptually Supported Sketch Editor, which were developed specifically for disambiguating object selection in an environment with rich object interpretations, as described in the above Saund and Moran references. In the Pose Matching technique, a user performs a quick gesture that indicates the approximate location, orientation, and elongation of an intended or targeted object, while in the Path Tracing technique, a user traces an approximate path over the intended or targeted object's curve.
While the above described explicit techniques disambiguate the selection process, each one requires a user to have explicit knowledge of either the particular application modes or the particular selection procedures. Additionally, explicit selection techniques necessarily involve indirect user access to objects, thereby making the human-computer interaction dialogue less conversation like, and undesirably decreases invisibility and/or smoothness of the human-computer interface.
There are other commonly occurring scenarios in the field of human-computer interaction, focusing on object selection and pattern recognition, in which a single click of a pointing device is not enough for users to determine intended or targeted objects. Such scenarios feature one or more of the following types of pointing ambiguities: overlapping of objects, inaccurate pointing due to demanding and/or dynamic conditions such as small targets, moving targets, and a sub-optimal working environment, and, limited pointing devices such as mobile pointing devices, gesture recognition interfaces, and eye-movement interfaces.
To one of ordinary skill in the art, there is thus a need for, and it would be highly advantageous to have a method and system for implicitly resolving pointing ambiguities in human-computer interaction (HCI) by implicitly analyzing user movements of a pointer toward a user targeted object located in an ambiguous multiple object domain and predicting the user targeted object. Moreover, there is a need for such an invention whereby implicitly analyzing the user movements of the pointer and predicting the user targeted object are done by using different categories of heuristic measures, which are widely applicable and extendable to resolving a variety of different types of pointing ambiguities such as composite object types of pointing ambiguities, involving different types of pointing devices besides the commonly used ‘mouse’ pointing device, and which are widely applicable to essentially any type of software and/or hardware methodology involving using a pointer, in general, and involving object selection, in particular.