1. Field of the Invention
The present invention relates to the field of computer software and, more particularly, to multimodal applications.
2. Description of the Related Art
A multimodal application is an application that permits user interactions with more than one input mode. Examples of input modes include speech, digital pen (handwriting recognition), and the graphical user interface (GUI). A multimodal application may, for example, accept and process speech input as well as keyboard or mouse input. Similarly, a multimodal application may provide speech output as well as visual output, which can be displayed upon a screen. Multimodal applications can be particularly useful for small computing devices possessing a form-factor that makes keyboard data entry more difficult than speech data entry. Further, environmental conditions can cause one interface modality available in a multimodal application to be preferred over another. For example, if an environment is noisy, keypad and/or handwritten input can be preferred to speech input. Further, when visual conditions of an environment, such as darkness or excessive glare, make a screen associated with a computing device difficult to read, speech output can be preferred to visual output.
Although users of small computing devices can greatly benefit from multimodal capabilities, small computing devices can be resource constrained. That is, the memory and processing power available to a small computing device can be too limited to support the local execution of more than one mode of interaction at a time. To overcome resource constraints, multimodal processing can be distributed across one or more remote computing devices. For example, if one mode of interaction is speech, speech recognition and synthesis processing for the speech mode can be performed upon a speech-processing server that is communicatively linked to the multimodal computing device. Software developers face a significant challenge in managing distributed multimodal interactions, some of which can be executed locally upon a computing device, while other interactions can be executed remotely.
Conventional solutions to distributed multimodal interaction management have typically been application specific solutions that have been designed into an application during the application's software development cycle. Accordingly, the features available for each modality, such as speech recognition features, are typically tightly integrated within the software solution so that future enhancements and additional features can require extensive software rewrites. Because hardware and software capabilities are constantly evolving in the field of information technology, customized solutions can rapidly become outdated and can be costly to implement. A more flexible, application-independent solution is needed.