U.S. Pat. No. 6,144,989, incorporated by reference herein, describes an adaptive agent oriented software architecture (AAOSA), in which an agent network is developed for the purpose of interpreting user input as commands and inquiries for a back-end application, such as an audiovisual system or a financial reporting system. User input is provided to the natural language interpreter in a predefined format, such as a sequence of tokens, often in the form of text words and other indicators. The interpreter sometimes needs to interact with the user in order to make an accurate interpretation, and it can do so by outputting to the user an inquiry or request for clarification. In addition, the back-end application also needs to be able to provide output to the user, such as responses to the user's commands, or other output initiated by the application. AAOSA is one example of a natural language interpreter; another example is Nuance Communications' Nuance Version 8 (“Say Anything”) product, described in Nuance Communications, “Developing Flexible Say Anything Grammars, Nuance Speech University Student Guide” (2001), incorporated herein by reference.
In the past, many natural language interpretation engines were designed to communicate with the user via a specific, predefined I/O modality. For example, some systems were designed to receive user input via a computer keyboard and to output results and clarification requests via a display monitor attached to the same computer as the keyboard. Other systems were designed to receive user input via a microphone and speech recognition software, and to provide output back to the user via text-to-speech software. Still other systems were designed to communicate with a user bidirectionally via the Web. Some systems were designed to support more than one I/O modality, but even then, the communication modalities were designed into the interaction system as a single unit.
Built-in communication mechanisms was problematical because, among other things, it was often necessary to re-program parts of the natural language interface whenever it was desired to support a new or different I/O modality. In addition, as the interface was reprogrammed to support new I/O modalities, it was difficult to maintain a consistent user feel for the application. A consistent user feel for a back-end application would afford a comfort level to the user, with both the application and the natural interaction platform, thereby increasing productivity and shortening the user learning curve.
Roughly described, the invention addresses the above problems by separating a user interaction subsystem from the natural language interpretation system. A user interaction subsystem can include an interaction block that is specific to a particular I/O modality and user device, and which converts user input received from that device into a device-independent form for providing to the natural language interpretation system. The user interaction subsystem also can take results from a back-end application in a device-independent form, and clarification requests and other dialoguing from the natural language interpretation system, and convert it to the appropriate format specific to the particular I/O modality and device. In this way all of the development for generating a natural language interpretation system optimized for a particular back-end application can be re-used for different I/O modalities and devices simply by substituting in a different interaction block into the user interaction subsystem. Similarly, all of the complications of interacting with particular modalities also can be concentrated in one module that is separate from the module(s) performing the natural language interpretation tasks.
In an embodiment, an interaction block includes an VO mode object that is specific to a particular I/O modality, and an I/O formatting object that is specific to the layout requirements of a particular I/O device. In an embodiment, the user interaction subsystem simultaneously supports more than one I/O device, such as by running several simultaneous instantiations of the I/O mode class for a particular modality, each referencing its own respective I/O Formatter object.
The following description, the drawings, and the claims further set forth these and other aspects, objects, features, and advantages of the invention.