1. Technical Field
This invention relates to the field of markup language processing, and more particularly, to processing data formatted using one markup language into data usable by another markup language.
2. Description of the Related Art
Markup languages aid computers in interpreting how data can be presented through a user interface. Typically, presentation information provided by a markup language in the form of tags can be inserted in a document around particular data to be formatted. For example, Hypertext Markup Language (HTML), the predominant markup language used on the Internet, provides information to a browser specifying how to display the data contained within an HTML formatted document. Other examples of markup languages can include eXtensible Markup Language (XML), Standard Generalized Markup Language (SGML), of which both HTML and XML are subsets, Wireless Markup Language (WML), and Handheld Device Markup Language (HDML). Generally, however, markup languages can include any set of data specifications which can define the presentation of data contained in a document.
As computer communications networks become more advanced, new services are regularly being introduced to end users. One such service is providing data from the Internet, referred to as content, to an end user through a speech interface. For example, the user can listen to content processed through a speech interface and delivered to a cellular telephone in the form of audio, rather than viewing the content through a browser implemented on a personal digital assistant (PDA) or a cellular telephone. Presentation of data in this manner can be advantageous for mobile applications. Particularly, voice interfaces offer users an intuitive, hands-free method, as well as an eyes-free method, of obtaining Internet content.
Voice eXtensible Markup Language (VoiceXML) is a markup language which can be used to format data for presentation through a speech interface. Version 1.0 of the VoiceXML specification has been published by the VoiceXML Forum in the document by Linda Boyer, Peter Danielsen, Jim Ferrans, Gerald Karam, David Ladd, Bruce Lucas, and Kenneth Rehor, Voice eXtensible Markup Language (VoiceXML™) version 1.0, (W3C May 2000). Additionally, version 1.0 of the VoiceXML specification has been accepted by the World Wide Web Consortium (W3C) as a proposed industry standard.
The vast amount of content presently available on the Internet has not been formatted using VoiceXML or another audio directed markup language format. Rather, most content has been formatted using HTML. For speech interface driven systems to process existing Internet content which has been formatted in HTML, the formatted content first must be converted to VoiceXML formatted content. Alternatively, the HTML content can be reformatted using another suitable audio directed markup language.
Presently, a process referred to as “transcoding” can be used to translate a document formatted in one markup language into a document formatted using a second markup language. Essentially, transcoding involves identifying tags of the first markup language and substituting them with corresponding tags of the second markup language. For example, in transcoding a document from HTML to VoiceXML, each HTML tag can be replaced with a corresponding VoiceXML tag. The resulting transcoded document then can be presented through a speech interface. In this manner, a transcoder can translate a document formatted in one markup language into a document formatted in another markup language.
Still, there can be disadvantages to transcoding markup languages of different modalities, where modality refers to the human sense to which the presentation of data is directed. For example, HTML is directed toward visual presentation of data. VoiceXML is directed to speech or audio directed presentation of data. One such disadvantage is that a change of modality in the presentation of content, from text to speech, can result in nonsensical sounding speech produced by a speech interface. Specifically, mere substitution of visually directed HTML tags with speech directed VoiceXML tags can result in documents that, when read by a speech interface, sound confusing to a listener. For example, tabular data formatted in HTML can be clearly viewed by end users. Although an HTML table can be recognized and retagged using VoiceXML for processing by a speech interface, the speech interface typically does not know a suitable way to audibly present the table in a comprehendible and user friendly manner. Specifically, the speech interface can present the table entries randomly, by row, or by column, each being potentially confusing to a listener. Thus, mere substitution of tags does not account for differing user interfaces. Moreover, transcoding necessitates tailoring user interactions to the interface, rather than tailoring the interface to the data presentation medium. For example, a user may wish to obtain a single portion of information or entry from a table formatted in HTML. However, after transcoding the HTML formatted document into a VoiceXML document, the user can be forced to listen to the entire poorly ordered table being audibly produced by a speech interface. Such situations can cause listener fatigue thereby defeating the advantages of a speech interface. Presentation of data in a structure suitable for interpretation by a speech interface can overcome listener fatigue, providing a more user friendly solution.
Another disadvantage of transcoding can be poor structuring of transcoded documents. For example, the organizational structure of a VoiceXML document can differ significantly from the structure of an HTML document due to the different modalities of each markup language. Moreover, replacing tags without regard to data placement within the document can result in fragmented data throughout the transcoded document. Accordingly, problems still exist with regard to transcoding markup languages of different modalities.