The present invention generally relates to computing apparatus and, more particularly, to methods and apparatus for providing a spoken language interface in association with such computing apparatus.
Speech technology has progressed to the point, in recent years, that command and control functions and transcription functions may be performed reliably using speech decoders such as, for example, the IBM Via Voice product line, a trademark of IBM Corporation of Armonk, N.Y. Technology for the encoding of text into audible speech is also widely available. Thus, it is reasonable to expect that products using these and other spoken language technologies will have been developed and brought to market. These products fall into two critical areas: enablers and tools. The personal speech assistant of the present invention is an enabler in the sense that it works in conjunction with a tool to enable access to the tool""s capabilities through a spoken language interface.
Typical tools employing voice are best exemplified by portable voice recorders. These include devices such as the xe2x80x9cVoice Itxe2x80x9d mobile digital recorder, and the Dragon Systems, Inc. xe2x80x9cNaturally Speaking Mobile Organizer.xe2x80x9d The first is merely a digital recorder which can be used to take notes which can be transcribed by a speech recognition program. The transcribed notes are not returned to the device. The device, as a simple recorder, does not accept voice commands. The user is required to push buttons to control the recording functions. In the second case, the Dragon Systems Mobile Organizer does allow the user to speak commands, but these commands are acted upon as part of the transcription process, at whatever future time the user chooses to download the recordings. There is no general capability to offer voice control to any device other than the transcription software in a personal computer. The hardware and software capabilities, for example, for text to speech encoding are not provided because the only data type such a device need manage is digitized audio, not encoded text. Thus, so called xe2x80x9cmobile digital recordersxe2x80x9d and voice input xe2x80x9cmobile organizersxe2x80x9d do not have the immediate connection or the ability to speak to the user, which are needed to assist a user by supporting conversational control and information supplying dialogs.
Another example of speech enabled tools may be found in xe2x80x9cpalm topxe2x80x9d computers. Devices such as the Casio E100 using the WinCE (a trademark of Microsoft Corporation of Redmond, Wash.) operating system allow individual applications to operate through spoken language related services. Access to these services is provided an Application Programmer""s Interface such as SMAPI, an xe2x80x9cenablerxe2x80x9d of speech interfaces. Here, each individual application on the computer must contain all of it""s dialog management data and software. The role of SMAPI is only to provide a common interface to the services of spoken language engines, not to provide the more abstract means to provide dialog or the hardware architecture to support dialog. Each application is thus a unique speech xe2x80x9cdevice.xe2x80x9d Dedicating a palm top computer to the task of an interface tool would not be cost effective.
An other example of an enabler is the Philips xe2x80x9cSpeech Mike.xe2x80x9d This device is a dedicated interface device which provides a microphone and a speaker, but can only operate in the context of a personal computer since it carries only enough on board intelligence to service the coding and communications needs of the built-in track ball.
In accordance with the present invention, a Personal Speech Assistant (PSA) is a computing apparatus which provides a spoken language interface to another apparatus to which it is attached. It is to be understood that the attachment may be made through physical means such as wires, radio waves or light, or by mixtures of logical and physical means such as computer networks or telephone networks. In order to provide a spoken language interface, a Personal Speech Assistant is designed to support execution of a conversational dialog manager and its supporting service engines. An example of a such a dialog manager is described in detail in the concurrently filed U.S. patent application Ser. No. 09/460,961, in the name of L. Comerford et al., and entitled: xe2x80x9cA Scalable Low Resource Dialog Manager,xe2x80x9d the disclosure of which is incorporated herein by reference. A preferred implementation is described in the detailed description below.
In operation, a PSA is connected to a device which provides some service to a user. Any xe2x80x9cappliancexe2x80x9d is a candidate for enhancement with the PSA. Devices such as, for example, video cassette recorders (VCRs) or Personal Digital Assistants (PDAs), which offer rich, but frequently difficult interfaces, may be made more useful by the integration of a PSA according to the invention. A PSA need not be permanently attached to the device for which it provides an interface. In a car, for example, a PSA may take on some of the responsibilities of the car key, in the sense that the PSA may be taken away by the owner when the car is parked. In this case, the owner may command the door to open, and the PSA, through a wireless connection and protocol, may translate that instruction into one accepted be the car. Once in the car, the owner may place the PSA in a xe2x80x9ccradlexe2x80x9d which offers a wired connection to the car electronics so that the radio, navigation, or environmental systems may be instructed concerning the owners wishes.
It is a preferred feature of a dialog manager used by the PSA that the user interface properties, in terms of the vocabulary the device understands, the informative prompts it provides, and other aspects of its conversational behavior, are all easily modified to correspond to the preferences or limitations of the user. If one word does not get recognized, a synonym can be made to replace it. If a prompt is not to the users liking, it is easily changed.
In an illustrative embodiment of the present invention, apparatus for providing a portable spoken language interface for a user to a device in communication with the apparatus, wherein the device has at least one application associated therewith, comprises: an audio input system for receiving speech data provided by the user; an audio output system for outputting speech data to the user; a speech decoding engine for generating a decoded output in response to spoken utterances; a speech synthesizing engine for generating a synthesized speech output in response to text data; a dialog manager operatively coupled to the device, the audio input system, the audio output system, the speech decoding engine and the speech synthesizing engine; and at least one user interface data set operatively coupled to the dialog manager, the user interface data set representing spoken language interface elements and data recognizable by the application of the device; wherein: (i) the dialog manager enables connection between the input audio system and the speech decoding engine such that the spoken utterance provided by the user is provided from the input audio system to the speech decoding engine; (ii) the speech decoding engine decodes the spoken utterance to generate a decoded output which is returned to the dialog manager; (iii) the dialog manager uses the decoded output to search the user interface data set for a corresponding spoken language interface element and data which is returned to the dialog manager when found; (iv) the dialog manager provides the spoken language interface element associated data to the application of the device for processing in accordance therewith; (v) the application of the device, on processing that element, provides a reference to an interface element to be spoken; (vi) the dialog manager enables connection between the audio output system and the speech synthesizing engine such that the speech synthesizing engine which, accepting data from that element, generates a synthesized output that expresses that element; and (vii) the audio output system audibly presenting the synthesized output to the user.