The present invention generally relates to user interfaces with computing systems and, more particularly to a spoken language user interface employing a dialog management system.
Computer user interfaces are, historically, oriented toward paper and typewriter interaction metaphors. Pointing devices allow multiple sheets (graphical user interface or GUI windows) of specialized virtual xe2x80x9cpaperxe2x80x9d to be addressed on a computer display. Recently, the state of the art in computer decoding of speech and encoding text into speech has progressed to the point that it is relatively easy to create text documents by dictating through a speech decoder into a text editing program and to have that or other documents read back aloud by encoding the document text into speech. An example of a commercial system supporting these capabilities is the IBM ViaVoice (a trademark of IBM Corporation of Armonk, N.Y.) product line. Since such decoders are capable of decoding a large spoken vocabulary, it is obvious that such decoding can be used for command language as well as text dictation.
Dialog management broadly refers to the sequence of exchanges between an application user and a software application in which the user is guided in providing information the application requires in order to accomplish some work desired by the user or some work required to further the program""s operation.
Dialog management has been a routine part of graphical user interface (GUI) programming. Specific support for directed dialogs is an integral part of the Windows (a trademark of Microsoft Corporation of Redmond, Wash.) graphic shell. Dialogs are as a rule, presented as forms into which the user types information or from whose lists, the user makes selections. Given the space available on computer screens, all the controls and information data entry fields needed by an application or the computer operating system, can not all be displayed at one time. The dialog in the context of graphical user interfaces is thus primarily a screen space conserving mechanism.
Spoken Language Interfaces also conduct dialogs in order to further the interaction between the user and the application. They provide great conservation of screen space since they do not require screen presentation. Unlike GUI Dialogs, however, there exists only the most limited and rudimentary support for creating and managing the Spoken Language Dialog. Spoken Language Dialog management has largely been performed by the program logic of each application or by a global navigation program with features oriented toward selection of the active application and its presentation on screen.
It is important to distinguish Dialog Management from provision of Application Programmer Interfaces for the xe2x80x9cenginesxe2x80x9d (such as a spoken command decoding engine) providing language related services. API""s such as Microsoft""s Speech Application Programmer""s Interface and the JAVA consortium JSAPI interfaces only provide an abstraction of the engines"" interfaces in order to allow application programs to operate regardless of the identity of the provider of the particular engines installed on a given user""s system. This provides a common low-level interface for providing and accessing the services of engines, but leaves the creation and management of dialog to the individual applications accessing these low level interfaces.
The present invention provides an architecture for a spoken language dialog manager which can, with minimum resource requirements, support a conversational, task-oriented spoken dialog between one or more software applications and an application user. Further, the invention preferably provides that architecture as an easily portable and easily scalable architecture. The invention supports the easy addition of new capabilities and behavioral complexity to the basic dialog management services.
As such, one significant distinction from the prior art is found in the small size of the dialog management system. This size is consistent with the resources of modern embedded computing systems which are found in devices other than conventional xe2x80x9cPersonalxe2x80x9d Computers or other purely data-processing systems. This invention may be applied equally easily to a computer used to operate a video cassette recorder (VCR) or a light switch or a Personal Digital Assistant (PDA). Given the teachings provided herein, one of ordinary skill in the art will realize various other applications.
In one illustrative embodiment of the invention, apparatus for providing a spoken language interface between a user and at least one application or system, wherein the apparatus operates in accordance with a computer processing system including a processor, an audio input system for receiving speech data provided by the user, an audio output system for outputting speech data to the user, a speech decoding system and a speech synthesizing engine, comprises: a dialog manager operatively coupled to the application or system, the audio input system, the audio output system, the speech decoding engine and the speech synthesizing engine; and at least one user interface data set operatively coupled to the dialog manager, the user interface data set representing spoken language interface elements and data recognizable by the application; wherein: (i) the dialog manager enables connection between the input audio system and the speech decoding engine such that the spoken utterance provided by the user is provided from the input audio system to the speech decoding engine; (ii) the speech decoding engine decodes the spoken utterance to generate a decoded output which is returned to the dialog manager; (iii) the dialog manager uses the decoded output to search the user interface data set for a corresponding spoken language interface element and data which is returned to the dialog manager when found; (iv) the dialog manager provides the spoken language interface element associated data to the application or system for processing in accordance therewith; (v) the application, on processing that element, provides a reference to an interface element to be spoken; (vi) the dialog manager enables connection between the audio output system and the speech synthesizing engine such that the speech synthesizing engine which, accepting data from that element, generates a synthesized output that expresses that element; and (vii) the audio output system audibly presenting the synthesized output to the user.
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.