U.S. Pat. No. 6,144,989, incorporated by reference herein, describes an adaptive agent oriented software architecture (AAOSA), in which an agent network is developed for the purpose of interpreting user input as commands and inquiries for a back-end application, such as an audiovisual system or a financial reporting system. User input is provided to the natural language interpreter in a predefined format, such as a sequence of tokens, often in the form of text words and other indicators. The interpreter sometimes needs to interact with the user in order to make an accurate interpretation, and it can do so by outputting to the user an inquiry or request for clarification. In addition, the back-end application also needs to be able to provide output to the user, such as responses to the user's commands, or other output initiated by the application. AAOSA is one example of a natural language interpreter; another example is Nuance Communications' Nuance Version 8 (“Say Anything”) product, described in Nuance Communications, “Developing Flexible Say Anything Grammars, Nuance Speech University Student Guide” (2001), incorporated herein by reference.
Of the target applications that can be made to work with a natural language interface, it has been found that many solve similar problems. There are multiple word processors, multiple banking web-sites, multiple stock researching applications. Each of these applications are similar in their domain or area of expertise, but are quite different in their publicly available programming interface. The API (Application Programming Interface) that each banking web-site presents has many different variables, representing the same concepts, but named differently. Likewise, the commands are similar, but are performed via different URLs, with subtle differences in behavior.
This relates to natural language interpretation engines because the concepts they need to interpret for a given domain can be similar while the mechanics they need to use to control different back-end applications might be vastly different. That is, the language that humans use to relate concepts about banking need to be understood by the natural language interpreter no matter which banking application is being used at the back end, while each banking application has its own API which must be used to receive commands and output results.
In the past, natural language interpretation (NLI) engines were designed to communicate with the back-end application via the specific mechanism required by the application. Different NLIs were required for different back-end applications, even in the same application domain. The different communication mechanisms could differ in their transport mechanisms (e.g. web vs. email vs. interprocess communication on the local machine), in their command structures, paradigms and syntax, and/or in the formats with which they provide results. The communication mechanism for each particular back-end application typically had to be built into the NLI engine.
Built-in communication mechanisms was problematical because, among other things, it was often necessary to re-program parts of the NLI engine whenever it was desired to support a new or different back-end application. In addition, as the NLI was reprogrammed to support new applications, it was difficult to maintain a consistent user feel for all the applications supported in the domain. A consistent user feel for the entire application domain would afford a comfort level to the user, with both the application and the natural interaction platform, thereby increasing productivity and shortening the user learning curve.
Built-in back-end communication mechanisms were problematical also for another reason related specifically to the nature and purpose of natural language interfaces. An important goal of natural language interfaces is to enable a user to communicate with application programs in a natural way, as nearly as possible to the way the user would communicate with another human being. But sometimes the task that a user wants done is not limited to a single command in the parlance of the application. For example, a user might ask an assistant to “Pay the $200 utility bill on the fifteenth and transfer the money from savings.” For a typical banking application, that might require two separate commands: one creating an electronic check payment to the utility company for $200 to be paid on the fifteenth, and a second for transferring $200 from savings into checking on the fifteenth. Sometimes, in fact, the task is not even limited to a single application domain. This might be the case for a request like “Pay the $200 utility bill on the fifteenth and send an email to John telling him it is coming,” which might call for commands to both a banking application and an email application.
Roughly described, the invention addresses the above problems by separating an actuation subsystem from the natural language interpretation system. The NLI develops “interpretation result commands” in response to user input, and transmits them to the actuation subsystem using a predefined interpretation result command format (such as XML strings that obey a predefined Document Type Definition (DTD)) which is independent of the requirements of the particular back-end application. The actuation subsystem, which is the only component that is specific to the back-end application, converts the interpretation result command into one or more “application commands” and communicates it (them) to the back-end application in the form required by the specific back-end application. In some embodiments the actuation subsystem also can take results from a back-end application in application-dependent form, and convert them to a common predefined internal format that is application-independent and may be defined, for example, in another DTD. In this way all of the development for generating a natural language interpretation system optimized for a particular back-end application can be re-used for different applications in the same application domain simply by substituting in a different actuation subsystem. Similarly, all of the complications of interacting with particular back-end applications also can be concentrated in one module that is separate from the module(s) performing the natural language interpretation tasks.
In an embodiment, the system is designed so that various target applications can be “plugged” into the system without altering any components related to user interaction or interpretation. Creating the “plug” is a matter of creating the mapping from the interpretation system to the target application API for commands, and the mapping from the target application API to the user interaction subsystem for command results.
To map the interpretation system's output to the target API, one can take many approaches. The Command Design Pattern, however, is preferred. By using command objects, the target API can be decomposed into efficient modules, clarifying the mapping process. Each command object has a specific task to be performed in the target application's space. For one application, that might be a matter of a simple method call, while another application might require a few method calls and several calculations in between. The command pattern hides those details from other developers, exposing only the behavior/responsibilities for that specific command.
Also, commands can be combined to generate more complex behavior, allowing a natural language front-end to present features that might not be accessible through the application's standard interface.
The banking example above, for example, though quite natural when expressed in language, is complex when approached with the bank's API. However, the mapping from the NLI's interpretation to command objects on the actuation side breaks the complex command down into manageable commands to be executed in a particular sequence.
Another example of this would be an opportunity management system, which actuates SalesForce.com for some commands related to sales efforts and other applications for commands relating to inventory or contact information. Thus the framework not only provides the ability to swap back-end applications easily (switch from SalesForce.com to FrontRange's GoldMine sales force automation application, for example), but it also allows a single NLI to drive collections of back-end applications in different domains as well. For example, the suite of SalesForce.com and Microsoft Exchange could be replaced by Siebel's much larger application, which handles both the opportnnity management and contact management features found in the combination of SalesForce.com and Exchange.
The Command design pattern also provides a convenient model for implementing undo functionality. Each Command, as it is built, is able to build its undo equivalent. All the current state information can be stored alongside the new state information, recording enough information to revert from the new state back to the original state if need be. Although not all systems must implement undo, this framework provides a clean solution if/when the need arises.
The following description, the drawings, and the claims further set forth these and other aspects, objects, features, and advantages of the invention.