1. Field of the Invention
The present invention relates generally to interactive voice response systems, and more particularly to voice browsing with a mobile terminal that uses a dual-mode wireless connection.
2. Related Art
As society becomes increasingly mobile, the need for immediate communications, instant access to data, and the ability to act on that data is critical. Far more people today have access to a telephone than have access to a computer with an Internet connection. In addition, sales of cellular telephones are booming, so that many people already have or soon will have a phone within reach wherever they go. Voice browsers offer the promise of allowing everyone to access packet data network based services from any phone, making it practical to access the packet data network any time and any where, whether at home, on the move, or at work.
Voice browsers allow people to access the Internet using speech synthesis, pre-recorded audio, and speech recognition. This can be supplemented by keypads and small displays. Voice may also be offered as an adjunct to conventional desktop browsers with high resolution graphical displays, providing an accessible alternative to using the keyboard or screen, for instance in automobiles where hands/eyes free operation is essential, or for use by people with visual impairments. Voice interaction can escape the physical limitations on keypads and displays as mobile devices become ever smaller.
Packet data networks offer the potential to vastly expand the opportunities for voice-based applications. Pages in packet data networks define the scope of the dialog with the user, limiting interaction to navigating the page, traversing links and filling in forms. In some cases, this may involve the transformation of packet data network content into formats better suited to the needs of voice browsing. In others, it may prove effective to author content directly for voice browsers.
Modem interactive voice response (IVR) services provide users with direct access to information stored in databases, saving companies time and money. For example, users can utilize IVR services to access voice mail, E-mail, keep track of appointments and contacts, and access stocks and news. Voice Browsers offer a great fit for the next generation of call centers, which will become packet data network portals to the company""s services and related packet data network sites, whether accessed via the telephone network or via the Internet. Users will be able to choose whether to respond by a key press or a spoken command. Voice interaction holds the promise of naturalistic dialog with packet data network-based services.
Many companies today provide commercial IVR servers, including Brite Voice Systems, Syntellect Inc., and InterVoice Inc. Others provide voice browsers, such as the Audio Web Research Team, Productivity Works, and General Magic, Inc.
Successful speech recognition is key to the success of IVR services. Typically speech recognition processing is performed at the server using dedicated software and hardware because most mobile terminals don""t have the processing capability and memory resources to effectively perform natural language recognition locally. Performing terminal-side speech recognition is advantageous because it relieves the burden on IVR service providers of having to provide speech recognition capabilities. This is particularly important where the service supports a large number of concurrent users. However, natural language recognition requires a large grammar to achieve acceptable results. Even more limited grammars associated with particular applications can be relatively large. Performing speech recognition using these grammars is beyond the modest capabilities of many low cost mobile terminals. This problem is more acute where multiple grammars are stored within a mobile terminal, for example where a mobile terminal accesses multiple voice applications.
In European Patent Application No. 0854417A2 entitled xe2x80x9cVoice Activated Control Unitxe2x80x9d (published Jul. 22, 1998; applicant: Texas Instruments Inc.), a wireless voice-controlled device is described that permits a user to browse a hypermedia network, such as the World Wide Web, with voice commands. This reference discloses the use of grammar files stored within the mobile terminal for speech recognition, but does not describe using limited size grammars to achieve more accurate speech recognition.
A need therefore exists for an improved system and method for providing IVR services, where accurate speech recognition is achieved using a mobile terminal having modest processing capability and memory resources, where the speech recognition uses grammars having a limited size.
The present invention is directed to a system and method for voice browsing IVR services using a mobile terminal. A voice application provided by the IVR service is accessible via a server connected to a network. A call connection is established between the mobile terminal and the server using a dual-mode connection, i.e., the call connection includes a voice mode and a data mode for alternately transmitting voice and data via the network. The voice application sends a grammar to the mobile terminal using the data mode, where the grammar defines the speech recognition results that the voice application is ready to accept as input or commands at its present state of execution. The voice application also sends to the mobile terminal speech content corresponding to the present state of execution such as audio prompts and instructions using the voice mode. The user responds orally to the speech content. The mobile terminal processes this voice input using speech recognition facilities. Valid input is extracted from the voice input based on the current grammar. The mobile terminal sends the valid input to the voice application using the data mode. The voice application continues execution based on the valid input.
The present invention exploits a feature offered by some communications networks that allows for dual-mode call connections having a voice mode and a data mode. Using this dual-mode connection, the mobile terminal and server can alternately exchange voice and data during a single call connection. State-dependent binary data can be therefore downloaded at the mobile terminal interspersed with voice communications.
An advantage of the present invention is that accurate terminal-side speech recognition is achieved with mobile terminals using modest memory and processing power. State-dependent grammars are downloaded at the mobile terminal using the data mode of the call connection. Because the grammar need only define valid speech recognition results for the voice application at its present state of execution, the state-dependent grammar can be relatively small compared to a natural language grammar. Smaller grammars reduce the processing capabilities and memory resources required at the mobile terminal, and allow for accurate speech recognition results using conventional statistical algorithms.
Another advantage of the present invention is that IVR service providers are not required to invest in and maintain dedicated resources for speech recognition, since speech recognition is accomplished within the mobile terminal. This is particularly important where the IVR service supports a large number of concurrent users.
Another advantage of the present invention is that speaker-dependent characteristics can be stored locally in the mobile terminal and used to improve the accuracy of terminal-side speech recognition. Speaker-dependent speech recognition is therefore achieved without requiring that the speaker-dependent characteristics be stored by the IVR service providers.
Another advantage of the present invention is that IVR service providers can update and maintain their voice applications at the server without requiring modification to the terminal software.
Further features and advantages of the invention, as well as the structure and operation of various embodiments of the invention, are described in detail below with reference to the accompanying drawings.