1. Technical Field
This invention relates to browsing network-based electronic content and more particularly to a method and apparatus for coordinating the operation of a visual browser and a voice browser.
2. Description of the Related Art
Visual browsers are applications which facilitate visual access to network-based electronic content provided in a computer communications network. For example, one type of visual browser, the Web browser, is useful for locating and displaying network-based electronic content formatted using HyperText Markup Language (“HTML”). The term “visual browser” denotes that the browser can display graphics, text or a combination of graphics and text. In addition, most visual browsers can present multimedia information, including sound and video, although some visual browsers can require plug-ins in order to support particular multimedia information formats.
Unlike a visual browser, a voice browser typically operates in conjunction with a speech recognition engine and speech synthesis engine and permits the user to interact with network-based electronic content audibly. That is, the user can provide voice commands to navigate from network-based electronic document to document. Likewise, network-based electronic content can be presented to the user audibly, typically in the form of synthesized speech. Thus, voice browsers can provide voice access and interactive voice response to network-based electronic content and applications, for instance by telephone, personal digital assistant, or desktop computer.
Voice browsers can be configured to interact with network-based electronic content encoded in Voice Extensible Markup Language (VoiceXML). VoiceXML is a markup language for distributed voice applications and is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and Dual Tone Multifrequency (“DTMF”) key input, recording of spoken input, telephony, and mixed-initiative conversations.
In an effort to provide users with the ability to interact with network-based visual content in a visual browser while also interacting with network-based audio content in a voice browser, some have proposed and developed multi-modal languages. The W3C, for example, has proposed a single authoring language known as DialogML which includes much of the capability provided by both visual and voice markup languages. Although multi-modal languages integrate the capabilities of both visual and voice browsers, in order to benefit from multi-modal languages, presently available applications written for single modal operation first must be completely rewritten in a multi-modal language.
Another proposed solution for integrating the functionality of a voice browser and a visual browser has been to code speech synthesis functionality into an existing visual browser to produce a speech-aware visual browser. Similarly, new speech-related markup tags for visual browsers have been proposed in order to provide speech functionality to a visual browser. Still, this solution requires the development of a speech-aware function set for handling network-based speech content and the integration of the same directly in the source code of the visual browser. In consequence, the development of speech-related functionality is tightly linked to the development of the remaining functionality of the visual browser. The tight integration between the visual browser and the speech-aware functionality precludes the user from using a separate, more robust, and efficient voice browser having a set of functions useful for interacting with network-based speech content.
Another proposed solution has been to provide multi-modal functionality by coupling the operation of a visual browser and a voice browser such that a user of both can interact with network-based electronic content concurrently. The browsers can be coupled by defining new attributes for tags in both the voice markup language and the visual markup language. This solution allows developers to develop applications for a browser of choice for both visual and voice-based applications. Although this solution enables presently available single modal applications to be transformed for use with such a coupling mechanism with less effort than completely recoding an application, this solution requires that both the visual browser and the voice browser be reprogrammed and reconfigured to interpret the new tag structures.