1. Field of the Invention
The present invention relates to command, control, and content navigation with respect to multimodal applications.
2. Description of the Related Art
Visual browsers are complex application programs that can render graphic markup languages such as Hypertext Markup Language (HTML) or Extensible HTML (XHTML). As such, visual browsers lack the ability to process audible input and/or output. Still, visual browsers enjoy a significant user base.
Voice browsers are the audio counterparts of visual browsers. More particularly, voice browsers can render voice markup languages such as Voice Extensible Markup Language (VXML), thereby allowing users to interact with the voice browser using speech. Voice browsers, however, are unable to process or render graphic markup languages.
Recent developments in Web-based applications have led to the development of multimodal interfaces. Multimodal interfaces allow users to access multimodal content, or content having both graphical and audible queues. Through a multimodal interface, the user can choose to interact or access content using graphic input such as a keyboard or pointer entry, using an audible queue such as a speech input, or using a combination of both. For example, one variety of multimodal interface is a multimodal browser that can render XHTML and Voice markup language, also referred to as X+V markup language.
To provide both graphic and voice functionality, developers are left with the option of developing a new multimodal browser/application or, alternatively, redesigning an existing visual browser/application to provide voice functionality. The complexity of visual browsers, and browsers in general, however, makes such efforts both time consuming and costly.
Further complicating the process of voice-enabling an application program, operations such as rendering content, command and control, and content navigation typically are distinct functions. Voice-enabling content refers to generating or playing an audible rendition of an electronic document such as a markup language document. Command and control pertains to graphical user interface (GUI) features such as commands that are accessible through menus and dialog boxes of an application. Content navigation pertains to the ability of a user to select hyperlinks presented within a rendered electronic document using voice, thereby causing a browser, for example, to load the document represented by the hyperlink. Thus, to speech enable an application program, efforts not only must be directed to voice-enabling the content, but also to voice-enabling command and control and content navigation functions of the application program.