1. Statement of the Technical Field
The present invention relates to markup transcoding and more particularly to transcoding visual markup into voice markup.
2. Description of the Related Art
The graphical user interface (GUI) transformed modern computing by providing a man-machine interface which could facilitate computer-human interactions regardless of the expertise of the end user. In consequence, visually accessible applications, including Web browsers, have provided a vehicle through which vast quantities of data can be presented and randomly digested by end-users. Vocally accessible applications, by comparison, have not experienced the same accelerated growth. Specifically, the physical limitations of the audio user interface (AUI) inhibit the comprehension of data which has not been presented in sequence. Rather, most voice applications are limited to the serial presentation data.
Traditional voice applications have incorporated an AUI based upon a menu-structure. These traditional voice applications more often than not provide static data from a fixed hierarchical menu format. Though difficult to program, once implemented the traditional voice application can be quite effective, though limited merely to static data. To enjoy the same advantages of visually accessible applications, however, voice applications ought to capitalize on data which can be captured from a variety of dynamically changing data sources, including those data sources disposed about the Internet.
Unlike the case of those voice applications which incorporate strictly static data, however, in the case of voice applications which incorporate dynamic data, the traditional fixed menu structure can prove problematic. Moreover, even when dynamic data is incorporated in a menu-based scheme, the dynamic data typically is authored directly from the data source into voice application markup, for instance using VoiceXML. Clearly, the cost of ownership of such an application proportionally relates to the maintenance of a link between the data source and the voice markup.
To facilitate the maintenance of dynamically changing data source links, transcoding processes both have been proposed and implemented, as is described in Michael K. Brown, Stephen C. Glinski, Brian C. Schmult, Web Page Analysis for Voice Browsing (2000). In a conventional transcoding process, a set of rules can be applied to a source document, each rule facilitating the transformation of markup from one format to another. For example, in a conventional transcoding process, hypertext markup language can be converted to VoiceXML. In particular, as described both in United States Patent Application Publication No. US 2001/0037405 A1 and also in United States Patent Application Publication No. US 2002/0007379 A1, elements in an HTML document can be matched to corresponding elements in the target wireless markup language (WML) document.
Though transcoding can be an effective technology for routinely transforming ordinary content from one type of markup formatting to another, transcoding in of itself cannot resolve the problem of effectively presenting randomly positioned content in a visual application within the menu-based structure of an AUI in a vocally accessible application. More particularly, Web pages typically are two-dimensional and graphically oriented. Web pages capitalize on the ability of the human eye to access data randomly on in a visual document using graphical cues such as image, color and tabular layout to attract attention.
The random placement of content in an AUI, however, does not lend itself well to the listener who must digest data sequentially as it is read, not randomly as the eye perceives the content. In particular, the relatively short attention span of the average end-user, when combined with the inability of the end-user to quickly re-scan input in a voice application menu structure can inhibit the retention of audibly comprehensible content. In consequence, what is needed is an improved system and methodology for transcoding visual content into voice content so that the listener can easily navigate to the most pertinent information.