1. Technical Field
Our invention relates to a system, method, and program product where a user interaction is interpreted and used by a computer system to interactively access assistance and information, where the assistance and information is provided according to input ambiguity. The input ambiguity is determined by one or more of current conditions, requested operations, the location of a tactile impulse or cursor, or the contents of an audible input. The ambiguity may be resolved by initiating the resolved requested action, presenting help screens or menus to the user, warning the user, or reminding the user to take an action.
2. Background Art
Multimodal interfaces are becoming increasingly popular in many areas. Multimodal interfaces are finding utility in global positioning satellite navigation systems, diagnostic systems for automobiles, distance learning, web browsing, voting, and machine and process control. One characteristic of multimodal interfaces is the recognition and interpretation of multimodal user inputs, such as audible inputs and tactile inputs.
One problem that arises with touchscreens is the absence of any tactile feedback; the “click” as the uses presses a “virtual button.” Heretofore, visual representations have been provided as a response to pressing a “virtual button.” These responses have included changes in color, changes in size or shape, changes in brightness, and blinking and flashing.
Another problem of multimodal interfaces is the proliferation of ubiquitous “Help Screens” and especially anthropomorphic figures frequently associated with “Help Screens.”
A further problem with tactile multimodal interfaces is how to deal with ambiguous user actions and recognition of complex (and potentially ambiguous) user actions. One solution, described, for example, in U.S. Pat. No. 6,587,818 to Kanevsky et al. for System And Method For Resolving Decoding Ambiguity Via Dialog, interrupts the process and asks the user questions. However, this method is not altogether acceptable in multimodal applications. This is because many features can be simultaneously confusing, and many questions may be necessary to resolve the ambiguity. Thus, in this context, there is a need for an intuitive, non-intrusive and natural feedback to the user, and an equally user friendly method for the user to implement the computer's resolution of the ambiguity.
Similarly, in the situation such as arises in a Global Positioning Satellite System (GPSS) interface, where a user touches a touchscreen or clicks a mouse, and asks for directions from “Point A” to “Point B” ambiguity can arise because the system may not be able to determine exactly what the user pointed to. The same ambiguity can also arise in a voice recognition system, especially if the system is unsure what the user said. The problem can also become more acute if the user must identify a plurality of points, as in the query “What is the distance from Point A to Point B via Point C?”, or where there the user is potentially distracted, as in programming or interrogating a Global Positioning Satellite navigation system while driving. Thus, a clear need also exists to provide user-friendly feedback to an ambiguous entry by a distracted user.
The existing graphical user interfaces are “tri modal” since they designed to be controlled by a keyboard and mouse, and the output is piped trough the monitor. From a user's perspective they are “bi modal.” This is because the user typically interacts with the computer by using hands and vision. Users with vision disabilities may use text to speech or BRAIL system as a way to receive the computer output.
The usual set of GUI controls: static text, edit box, list box, check box, scrollbar, table, and views have evolved over years within the context of systems such as Motif, Mac OS, Windows, Linux and Web browsers such as NetScape, Mozilla and Internet Explorer. Most of these controls were adjusted to “tri modal” user interaction. The keyboard and mouse actions are transferred into GUI actions, when user can often can see how they being reinterpreted into the output stream, that is, highlights, selections, visual appearance of the windows, changed text and so on.
This gives rise to a continuous loop of user action—changed output. Correction, if needed is a natural part of the process. In this context the computer is constantly informing the user about its changed state, inexplicitly exposing the state to the user. The user in this situation has a naturally developed feeling about how well he or she was entering the new data, or, in some cases, how well the computer “understood” the user's intention.
This situation changes dramatically with introduction of speech. It has been reported that speech is a likely candidate to become a commonly used input modality. Now, keyboard-mouse input of the information is synchronous because in both cases it is being generated by hands using devices designed to create digital streams of key and mouse clicks. Adding speech immediately poses a few problems:                Speech recognition is inaccurate—and it is necessary to resolve recognition errors?        Speech and keyboard/mouse input (speech and hands input from user's perspective) are asynchronous by nature; and the issue is how to provide reasonable behavior in situations when the combination speech and hand input:                    can be interpreted,            can only be interpreted with ambiguity,            can not be interpreted yet (the information is not complete, system is waiting for an additional input), or            can not be interpreted at all (contradiction in the input data, the input data are incomplete, expiration of the original data or request).                        