1. Technical Field
The present invention relates to the field of speech recognition and, more particularly, to voice interactions within multimodal interfaces.
2. Description of the Related Art
Computing devices containing multimodal interfaces have been proliferating. A multimodal interface as used herein refers to an interface that includes both voice processing and visual presentation capabilities. For example, numerous cellular telephones can include a graphical user interface and be capable of responding to speech commands and other speech input. Other multimodal devices can include personal data assistants, notebook computers, video telephones, teleconferencing devices, vehicle navigation devices, and the like.
Traditional methods for vocally interacting with multimodal devices typically involve first audibly prompting a user for speech input. Responsive to this prompting, the device receives a requested speech input. Next, an audible confirmation of the speech input can be presented to the user. Such vocal interactions can be slow due to the need to serially relay messages between the user and the multimodal devices. The inefficiency of audible prompting and confirmation can result in considerable user frustration and dissatisfaction.
For example, a user of a multimodal device can be audibly prompted to “speak the name of a departure city.” The user can then speak a city name followed by a confirmation response, such as “You entered Boston, is that correct?” The user then responds and another speech input prompt will be audibly presented to the user. Such interactions, typical of conventional systems, fail to utilize the visual capabilities of the multimodal device.