Artificial intelligence systems play an important role in commerce by allowing users to receive information or assistance from a human-like avatar without the need to hire an actual human representative to communicate with the user.
Voice-based interfaces exist for accessing and controlling underlying software, allowing the user to issue natural language commands instead of using an input device to manipulate user interface elements such as menus, buttons, or other widgets. For example, many mobile smartphones provide a function for speaking to the phone in order to perform queries (“Siri, what is the capital of France?”) or commands (“Call Jim.”).
Other interfaces allow a user to input text and exchange instant messages in a chat with an AI system. For example, an online merchant or web hosting service may reduce load on customer service representatives by having a first-tier of support in which a customer describes his or her problem to a virtual helper (“I'm having trouble finding products X98 and X99 on your website.”).
When a user inputs a command or query, the software underlying most existing systems only parses the input statement in order to determine a set of named entities within the statement. In the previous examples, the sets of named entities might be {“capital”, “France”}, {“call”, “Jim”}, and {“X98”, “X99”, “website”}.
Users may wish to express a complex query or command—for example, one that implies Boolean algebra, such as “I want to watch a film starring Robert Downey Jr., but not Ben Stiller.” Parsing this input into a set of named entities results in the set {“film”, “Robert Downey Jr.”, “Ben Stiller”}. A system using only this set of entities to perform a query on an existing database will be unable to return results consistent with the user's intent, because the distinction between the actors in the user's original input has been lost.
Users may find an interface more natural and human-like if they are able to carry on a dialog through the interface, asking a series of questions or issuing a series of commands and receiving responses from the system in the same format. Dialogs present a unique set of challenges for an artificial intelligence program interpreting the input, as the current topic of the dialog can repeatedly change, or the meaning of a statement may be ambiguous without knowledge of the current context or consideration of a previous statement.
Existing systems are unable to dynamically track the state of a dialog containing many statements and replies. Each statement is examined in isolation, and existing systems cache or delete previous statements in the dialog if the user makes a statement that begins a new query or command. If a user states “I want to watch a movie with Robert Downey Jr.”, a system might reply “Would you like to watch Tropic Thunder?” The user might then ask “Who directed Tropic Thunder?” This second query becomes the only active query, and the system ceases responding to the original query. After the second query is performed and answered, the user must repeat the original query or issue some other command to go back to learning about Downey's filmography.
Existing systems require the user to submit each query in a single statement, and this statement must generally conform to some expected pattern. The user query “I want to watch a movie with Robert Downey Jr.” might return the response “Would you like to watch Tropic Thunder?” The user might then reply “One without Ben Stiller.” Existing systems cannot process a second, fragmentary statement that is, when parsed in isolation, neither a query nor a command. The user's command “One without Ben Stiller” might return an error, or a list of all movies not starring Ben Stiller, or even all movies starring Ben Stiller, in that statement, while ignoring possible logical or semantic relationships between those entities and entities named earlier within the dialogue. Existing systems are unable to connect new entities to entities from previous statements with logical or semantic relationships.