1. Field of the Invention
The invention relates to user-machine interfaces, and more particularly, to techniques for suggesting contextually relevant follow-up hints to improve the effectiveness of natural language user interaction with a back end application.
2. Related Art and Summary of the Invention
Mobile devices are becoming extremely popular and capable, yet they suffer from at least two user-interface-related problems that are holding back further deployment and simplicity of use.
First, because of the relatively small form factor and entry-limitations of mobile devices, simply resizing a Graphical User Interface (GUI) designed for a desktop experience has not been sufficient. Entirely new interfaces have been designed, which lack the luxury of multiple windows, taskbars, quick launch pads and other conveniences and otherwise limit the amount of information and the number of user-selectable choices present on the screen. As a result, multiple interactions have become necessary in many cases for the user to reach a desired point in a desired application. But mobile devices also can suffer from lengthy delays between successive interactions with a back-end application, rendering a solution of multiple interactions sub-optimal. The recent introduction of natural language interfaces for mobile devices has helped, since they enable a user to go directly to a desired menu item or application screen without multiple interactions with a back-end application and without having to know menu structures or application organizations in advance. However, they still require the user to enter information affirmatively. It would be desirable if a user interface for a mobile device could offer the advantages of both user-selectable on-screen choices and natural language interaction.
Second, while numerous applications are available for use on mobile devices (e.g., Location Based Services, infotainment, enterprise applications), many have not yet become popular or widely used. Partly this is due to a lack of integration with other more important applications (i.e., contacts, calendar, email, phone). Integration here is generally meant as having access to an appropriate function in one application from a certain point in the other. For example, while reading an email on a RIM Blackberry, a user is able to click on the sender to look it up in the contact book. For a map application to be integrated into the RIM Blackberry application set, one would expect to be able to easily get a map of a contact while viewing the contact information. Historically, many of the most successful mobile device operating systems have been the ones that integrate more applications and services better: contacts and calendar in the case of early Palm devices, and contacts, calendar, email, and phone in the case of RIM Blackberries, for example.
In the past, integration of multiple applications has often required cooperative development between the different vendors or development teams, or development of inter-application standards to which the different applications must subscribe. It would be highly desirable if effective integration could be accomplished in a user interface for a mobile device rather than requiring cooperation by different development teams.
According to the invention, roughly described, the above problems are addressed by the use of a context reactive user interface which offers user-selectable on-screen choices or hints to help the user follow up in the context of his or her previous interactions. Alternatively or additionally, the system can offer certain on-screen choices which, when selected by the user, can invoke one or more back-end applications with entry fields pre-filled from the user's previous interactions or from other contextual information.
In an embodiment, user input can be either by choosing user-selectable on-screen choices or by entering natural language input, whichever the user prefers at a given point in the interaction. The natural language input is interpreted by an agent network such as that described in U.S. Pat. No. 6,144,989, incorporated by reference herein. In such a network, sometimes referred to generally herein as an AAOSA agent network, user input is provided to the natural language interpreter in a predefined format, such as a sequence of tokens, often in the form of text words and other indicators. The interpreter parses the input and attempts to discern from it the user's intent relative to the back-end application(s). The agent network is organized as a hierarchy of semantic domains, with each agent responsible for recognizing only references within its own domain. Each agent processes requests either directly or by combining its processing with results produced by other agents. The network structure defines the communication paths between agents, which in turn determine the way agents receive requests and provide responses.
The agent network operates by passing requests from agent to agent. A request begins at the root of the hierarchy and flows down (downchain) to other agents. Agents examine the request and decide for themselves whether they have anything to contribute. Response flow back upchain using the same message paths as the request. Since one agent can have more than one upchain connection, a downchain agent can receive the same request from every agent above it. It will only process the request once, however, and will send the same response to all of its upchain agents.
The network processes a natural language request in two phases. Phase one relates to interpretation of the request—the determination of the user's intent. Phase two is the actuation phase, in which the network uses its understanding of the request to generate a command to a back-end application. Phase one begins when the top-level agent receives the request from the user. It passes the request to its downchain agents, which pass it along to their downchain agents, and so on until every agent has seen the request. Each node examines the request, deciding whether it recognizes anything in the request that it knows how to process. If the agent sees anything, it makes a claim on whatever part of the request it thinks it understands.
An agent may make multiple claims on multiple parts of the request, including claims on overlapping parts of the request. If an agent sees nothing of interest in the request, it sends an explicit “no claim” message upchain. An upchain agent examines the claims it receives and may make its own claim based on the downchain agent claims; it may reject those claims based on its own, better understanding of the request and make a claim unrelated to those it received; or it may decide that neither it nor its downchain agents have anything to contribute and send a “no claim” message to its upchain agents. In this way claims and “no claim” responses travel up the network tree until they reach the top-level agent.
Often an agent will receive multiple claims returned from the agents below it. A set of rules is used to determine the relative strength of each claim. It is up to the upchain agent to decide whether to pass along multiple claims or to send only the strongest. The top-level agent makes the final selection among competing claims, selecting a set of one or more “best” claims. The set of winning claims can include more than one claim, so long as they do not conflict with each other. For example, user input such as, “Find emails to John and forward them to Jane” might generate a set of two winning claims: “Find emails to John” and “Forward selected emails to Jane”. Each claim identifies the agents that contributed to it, and therefore represents an “interpretation path” through the agent network.
Once the top-level agent has selected a set of winning claims, it begins the second phase: the generation of the action response (e.g., a command to an application). This time, the request is passed only to those agents are included in one of the winning interpretation paths. Each included agent has its chance to contribute to some part of the command.
AAOSA is one example of a natural language interpreter; another type that can be used is Nuance Communications' Nuance Version 8 (“Say Anything”) product, described in Nuance Communications, “Developing Flexible Say Anything Grammars, Nuance Speech University Student Guide” (2001), incorporated herein by reference. AAOSA is preferred, however, because the semantic relationships relevant to the back-end applications are already embodied in the structure of the agent network. These semantic relationships can be used to develop context-sensitive follow-up choices in which the user might be interested as described hereinafter. The agent network can be thought of as including a “database” of semantic relationships, where the term “database” as used herein does not necessarily imply any unity of structure. For example, two or more separate databases, when considered together, still constitute a “database” as that term is used herein. If another type of natural language interpreter supports hierarchies of semantic relationships similarly to AAOSA, or if semantic relationships are maintained elsewhere in a separate database, then other types of natural language interpreters can be used.
Follow-up choices (also referred to herein as “hints”) can be developed as pieces of information that have an association with the action previously taken by the user. For example if the user searches for a contact, then “Sending emails to the contact”, and “setting an appointment with the contact” may be associated with the user's action and may be provided as hints for follow-up. A hint has value in that when it is presented to the user in an appropriate context, it helps the user clarify a command or carry out related commands. Hints can also be used to help the user learn about the back-end application. Generally hints can be presented as either a natural language sentence, as icons, or as menus.
Hints in an AAOSA-based embodiment can be derived from the inter-agent relationships in the agent network. In particular, if a winning interpretation path includes a chain of one or more agents in the network, and if the agents are organized in the network according to appropriate semantic relationships, then alternative paths which differ from the interpretation paths in limited ways likely will represent reasonable follow-up choices in the current context of the user interaction.
For example, in one embodiment, agents are of specific categories or “types”, depending on the semantic function of the agent's domain in natural language user input. Preferably but not necessarily, three main semantic categories are used: commands, objects and fields. These categorizations are chosen because they tend to correspond to the command structures used in a wide variety of back-end applications. That is, commands in many applications often involve a command (an action that the user desires to be performed), an object (the structure on which the action should be performed), and fields (within the specified object). An advantageous organization for an AAOSA agent network therefore places command agents (agents whose function is to recognize the command part of user input) at a first level in the hierarchy, just below the root or top agent, object agents at a second level in the hierarchy, and field agents at a third level in the hierarchy. All command agents are immediately downchain of the top agent, and all the object agents in the second level are immediately downchain of at least one command agent. All the field agents in the third level are immediately downchain of at least one object agent in the second level. In an embodiment, multiple object levels precede the field level. Variations of this organization are also possible, and some of them are described below. Other very different organizations are also possible.
Using the command, object, field agent organization, user input generally results in winning interpretation paths that include either only a command agent, or a command agent and one or more object agents, or a command agent, one or more object agents, and one or more field agents. At least four kinds of hints can be generated based on this organization. These include “General” hints, “Applicable Objects” hints, “Relevant Fields” hints and “Relevant Commands” hints. All are described in more detail below, but all involve suggesting either a next agent downchain from the deepest agent in an interpretation path, or an alternative agent which is a sibling or an upchain (or other predefined type of relationship) of an agent that does exist in an interpretation path.