1. Field of the Invention
The present invention relates to the field of computer software and, more particularly, to presenting multimodal Web page content on sequential multimodal devices.
2. Description of the Related Art
Web-based applications can be accessed through a variety of interfaces falling into a graphical user interface (GUI) category and voice category. The GUI category can include input and output interface modes for interacting with a GUI of a conventional desktop computer. For example, modes within the GUI category can include, but are not limited to, a keyboard mode, a mouse mode, a trackball mode, a joystick mode, a display mode, and the like. In contrast, the voice category can include interface modes that are accessible using a telephone or similar telephony device. Voice modes can include, for example, an automatic speech recognition (ASR) mode (or a speech input mode), a speech-synthesis mode (or a speech output mode), and Dual Tone Multiple Frequency (DTMF) mode.
A multimodal Web device can be a device that can access Web content using interface modes from both the GUI category and the voice category. For example, a multimodal device can receive speech input and provide visual output via a display. In another example, a multimodal device can receive keyboard input and provide speech output. The use of multiple interface modes to access Web content can be especially important when the content is accessed using a mobile computing device because the mobile computing device can have limited input and output capabilities.
Many mobile computing devices with GUI and voice capabilities lack the ability to simultaneously utilize the voice and GUI capabilities. For example, many Wireless Access Protocol (WAP) enabled devices can support only one communication connection at a time, which can be either a voice connection (providing voice-only interface capabilities) or a data connection (providing GUI-only interface capabilities). When such a device establishes a voice connection, it can receive and transmit acoustic signals (speech) but cannot simultaneously convey binary data. Similarly, when the device establishes a data connection, binary data can be exchanged but the acoustic signals (speech) cannot be conveyed.
One technique that permits a mobile computing device that cannot simultaneously interact in GUI and voice modes to function in a multimodal fashion involves switching between a voice and a data connection, thereby permitting a user of the device to switch from a GUI interface to a voice interface. Devices capable of switching between GUI interface modes and voice interface modes can be called sequential multimodal devices.
When switching between a GUI interface mode and a voice interface mode, a mobile computing device can access different source Web pages. For example, when in a GUI interface mode, the mobile computing device can access and render a Wireless Markup Language (WML) Web page that specifies textual data. When in a voice interface mode, the mobile computing device can request a Voice Extensible Markup Language (VoiceXML) Web page from a Web server. The Voice XML Web page can be conveyed from the Web server to a voice server. The voice server can perform text-to-speech conversions upon the Web page data and perform speech-to-text conversions upon received speech input in accordance with the VoiceXML Web page specification. Accordingly, the content of the VoiceXML Web page can be presented to the mobile computing device as a sequence of acoustic signals (speech) conveyed across a voice connection.
One problem with relying on differentially formatted Web pages in order to function in a sequential multimodal fashion is that the technique requires Web servers to provide Web pages in multiple formats. The creation and maintenance of differentially formatted Web pages can involve significant amounts of overhead; often the overhead for developing and maintaining multiple Web pages is greater than many Web content providers are willing to endure.
The overhead of the Web content provider would be substantially lessoned if a Web page specified in a multimodal fashion could be decomposed into a GUI format and a voice format for presentation upon sequential multimodal devices. At present, a multimodal Web page cannot be converted into one or more Web pages having a single modality so that the content of the multimodal Web page can be accessed using a sequential multimode device.