1. Statement of the Technical Field
The present invention relates to a user interface and more particularly to voice enabling a multimodal markup language defined user interface.
2. Description of the Related Art
The user interface of a computer program serves the function of receiving input from an end user for underlying program logic, and for providing output produced by the program logic. Initially a mere command prompt, the conventional user interface has evolved over time into the complex, graphical user interface familiar to most computing end users today. More recently, the graphical user interface has been rendered both portable and dynamic through the utilization of markup language and server page technologies, including the extensible hypertext markup language (XHTML).
Notwithstanding the tremendous advances in the visual user interface, the visual aspect can be appropriate in many circumstances. For instance, some applications are deployed in environments not conducive to the use of a keyboard and monitor. Examples include telephonic applications including interactive voice response systems and hands-free applications such as those deployed in an automobile, to name only a few. To accommodate these non-traditional environments, extensive use has been made of the audible user interface. In fact, whole technologies, including the voice extensible markup language (VoiceXML) have been developed to address this unique market segment.
Not all applications operate in an environment dominated by a particular modality of interaction. In fact, in some multi-modal environments, often both audio and visual interface cues can be appropriate. Previously, multimodal environments required a separately specified user interface for each modality of interaction, including for instance an audio user interface and a graphical user interface. To generate a separate user interface for each specified modality of interaction, however, can be costly in terms of development time, expertise and maintenance.
Multimodal applications are computing applications which provide multiple interface types to accommodate the needs of prospective end users. Importantly, multimodal applications do not require separate user interfaces to accommodate each separate modality of interaction. Rather, the content of a multimodal application can specify the presentations and interactions in both visual and voice modalities. In most cases, the end user can choose a desired, most efficient input method for interacting with the underlying logic of the application.
Notably, the XHTML+Voice (X+V) markup language represents one technical effort to produce a multimodal application development environment. In X+V, XHTML and VoiceXML can be mixed in a single document. The XHTML portion of the document can manage visual interactions with an end user, while the VoiceXML portion of the document can manage voice interactions with the end user. The Multimodal Toolkit for WebSphere® Studio manufactured by IBM Corporation of Armonk, N.Y., United States incorporates X+V support in developing multimodal applications.
In X+V, command, control and content navigation (C3N) can be enabled while simultaneously rendering multimodal content. The X+V profile specifies how to compute grammars based upon the visual hyperlinks present in a page. Nevertheless, in practice it can be difficult and ambiguous for the user to understand what vocabulary has been activated to enable the voice hyperlinks. Accordingly, a simplified methodology would be desirable for computing a grammar for navigating hyperlinks and activating elements that accept mouse input by voice.