VoiceXML is a Web-based industry-standard markup language that came out of a consortium of AT&T, IBM, Lucent and Motorola for building distributed Internet-based voice applications that enable Web authors and designers to create tags, similar to HTML. Whereas HTML assumes a graphical Web browser that appears on a display and that is controlled by using a keyboard and a mouse, VoiceXML assumes a voice browser with an audio input, which may comprise voice or keypad tones and an audio output, which may be computer-synthesized or recorded. VoiceXML is designed to create audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. Its major goal is to bring the advantages of Web-based development and content delivery to interactive voice response applications, and simplify these tasks.
Telephones have been important to the development of VoiceXML, although VoiceXML's appeal is not limited to use with telephones. FIG. 1 shows a conventional VoiceXML system that includes a VoiceXML browser 103 that runs on a specialized voice gateway node 102 that may be connected to a public switched telephone network (PSTN) 104 and to the Internet 105. VoiceXML browser 102 may include a VoiceXML interpreter context that may detect a call from a user of telephone 101, acquire a VoiceXML document and answer the call. Voice gateway nodes extend the power of the Web to the world's more than one billion telephones, from antique black candlestick telephones up to the latest mobile telephones. VoiceXML takes advantage of several trends: the growth of the Web and its capabilities, improvements in computer-based speech recognition and text-to-speech synthesis, and the accessibility of the Web from locations other than desktop computers.
Although advances have been made in converting HTML documents to VoiceXML form, such attempts have at best yielded limited semi-automated voice browsers. Even with current state-of-the-art voice browsers and voice portals, effective provisions for intelligently and dynamically converting HTML documents to VoiceXML form are lacking.
The first attempt in converting text information on a computer screen to speech was done by using screen readers. However, the techniques used by these screen readers failed to convey the structure of the document rendered on the screen. Other voice browsers, such as pwWebSpeak®, are suitable mainly for technically savvy users because of the browsers' complexity. Even though they are an improvement on screen readers, they only support telephone access by one user at a time, and use proprietary speech recognition technology that does not conform to any industry-wide voice standard like, for example, VoiceXML.
In recent years, various Web-related services have made improvements to voice browsers by providing voice portals that allow a user to dial in via telephones and access the Web. These improvements have been limited mainly to certain Web sites, such as Web search engines or sites with content related to, for example, finance, sports, weather and traffic. The services that provide these improvements include Web-On-Call®, WebGalaxy®, Tellme®, and BeVocal® portals. However, they provide access only to Web sites that have been manually pre-converted or re-authored into voice-enabled form such as VoiceXML. As a result of manual conversion, there are two versions of the same information, the Web site in HTML form and the VoiceXML document. If any information on the Web site changes after the manual conversion, such information will not be updated on the corresponding VoiceXML document. Thus, manual conversion suffers from problems of synchronization between the Web site and the VoiceXML document. What is needed is a VoiceXML-based solution that dynamically converts HTML into VoiceXML without the problems associated with existing services, and is also applicable to any Web site or Web page.