1. Field of the Invention
The present invention relates to the field of computer software and, more particularly, to multimodal applications.
2. Description of the Related Art
Currently, multimodal applications include a variety of interface types called modes. Input modes can include, for example, keyboard, pointing device, speech recognition, handwriting recognition, Dual Tone Multiple Frequency (DTMF), and the like. Output modes can include speech synthesis, visual displays, and the like. Multimodal applications permit users to interact with the application using a combination of graphical user interface (GUI) and speech modes.
When the application developer includes speech modes within an application, the application programmer can be required to implement highly complex algorithms. This is true even though the application developer may only desire to speech-enable a few GUI elements, such as a toolbar or menu option. Often the overhead required to speech enable one or more GUI elements is too expensive to economically implement. Accordingly, it would be advantageous to provide a simpler means to speech-enable application operations than those methods which have been conventionally used by software programmers.
Further, when a multimodal application renders a multimodal Web page, the multimodal application may not process speech input for both application operations and the Web page content in a coordinated fashion. For example, a single speech input can be interpreted one way by the application and can be simultaneously interpreted in a different way by a voice server that interprets voice-enabled markup of the Web page. More specifically, a speech input of “Next” can be interpreted by a speech-enabled Web browsing application as initiating an application operation that advances the application to another Web page. At the same time, the presently rendered Web page can display a plurality of records, where the speech input of “Next” can be recognized as a command to display the next set of records. Accordingly, the Web browser can behave in an unpredictable manner when the speech input of “Next” is received. It would be desirable to implement the application operations and the Web page content in a more unified manner.