Radio communication devices, such as cellular phones, have ever-expanding processing capabilities and subsequently software application to run on them. However, the size of the device makes it difficult to attach the user interface hardware normally available for a computer, for example. Cellular phones have small keyboards and displays. However, techniques have been developed to take advantage of the basic voice communication ability inherent in the cellular phone. Speech recognition technology is now commonly used in radio communication devices. Voice activated dialing is now readily available. With the advent of data services including use of the Internet, it has become apparent that speech-enabled services can greatly enhance the functionality of communication devices. Towards this end, a Voice Extensible Markup Language (VoiceXML) has been developed to facilitate speech-enabled services for wireless communication devices. However, with the advent of speech-enabled services available to consumers, some serious problems arise in regard to portable communication devices.
Speech enabled services provide difficult challenges when used in conjunction with multimodal services. In multimodal dialogs, an input can be from speech, a keyboard, a mouse and other input modalities, while an output can be to speakers, displays and other output modalities. A standard web browser implements keyboard and mouse inputs and a display output. A standard voice browser implements speech input and audio output. A multimodal system requires that the two browsers (and possibly others) be combined in some fashion. Typically, this requires various techniques to properly synchronize application having different modes. Some of these techniques are described in 3GPP TR22.977, “3rd Generation Partnership Project; Technical Specification Group Services and Systems Aspects; Feasibility study for speech enabled services; (Release 6), v2.0.0 (2002–09).
In a first approach, a “fat client with local speech resources” approach puts the web (visual) browser, the voice browser, and the underlying speech recognition and speech synthesis (text-to-speech) engines on the same device (computer, mobile phone, set-top box, etc.). This approach would be impossible to implement on small wireless communication devices, due to the large amount of software and processing power needed. A second approach is the “fat client with server-based speech resources”, where the speech engines reside on the network, but the visual browser and voice browser still reside on the device. This is somewhat more practical on small devices than the first solution, but still very difficult to implement on small devices like mobile phones. A third approach is the “thin client”, where the device only has the visual browser, which must be coordinated with a voice browser and the speech engines located on the network. This approach fits on devices like mobile phones, but the synchronization needed to keep the two browsers coordinated makes the overall system fairly complex.
In all these approaches, a problem still exists, in that, the solutions are either impractical to put on smaller devices, or require complex synchronization.
Therefore, there is a need to alleviate the problems of incorporating voice browser technology and multimodal technology into a wireless communication device. It would also be of benefit to provide a solution to the problem without the need for expanded processing capability in the communication device. It would also be advantageous to avoid complexity without any significant additional hardware or cost in the communication device.