In multimodal applications, users can interact with other input modalities than only the keypad. For example, commands that are traditionally given by scrolling and clicking can be speech-enabled in the application so that the user can speak the commands, which will then be recognized by an automatic speech recognition engine. Adding speech interaction to visual applications receives growing interest as the enabling technologies are maturing, since in many mobile scenarios using the keypad is difficult, for example when driving or walking.
Until now different multimodal browsing architectures have already been proposed. For example the document U.S. Pat. No. 6,101,473 describes a method, where voice browsing is realized by synchronous operation of a telephone network service and an internet service. This is definitively prohibitive due to the waste of network resources, requiring two different communication links. Further this service requires an interconnection between the telephone service and the internet service. Another hurdle for user satisfaction is that the over-the-air co-browser synchronization required in a distributed browser architecture may cause latencies in browser operation which will degrade the user experience.
The document U.S. Pat. No. 6,188,985 describes a method in which a wireless control unit implements the voice browsing capabilities to a host computer. For this purpose, a number of multimodal browser architectures have been proposed where these operations are placed on a network server.
The patent U.S. Pat. No. 6,374,226 describes a system that is capable of changing the speech recognition grammar dynamically. For example, when an E-mail program goes to the composition mode, a new grammar set up is dynamically activated. This includes on one hand an improved use of device resources, but also includes the severe disadvantage that the device changes its “passive vocabulary.” This may lead to frustrating experiences as the user who has learned that the device understands a certain expression may be faced with a device feigning deafness for its input when running another application.
The known systems suffer from the fact that users are not very keen to take the speech-enabled features into use. Another problem arising from the state of the art is that users may not always be aware of the operation status of speech enabled browsing systems.
While there are standards being developed for how to write multimodal applications, there are no standards as to how the application interface should be built so that it would be as easy as possible for the user to become aware that speech input can be used.
Especially in devices and applications it would be desirable for a user to know which particular voice input is allowed at different times or under certain conditions.
When a user has put a speech recognition system successfully into use, it is probable that the user also continues to use it. In other words, there is a hurdle in starting to use speech control.
The problem has been solved earlier by audio prompts etc., but these become annoying very quickly, which degrades the usability experience.
Moreover, due to system load or the behavior of applications, all speech control options may not be available at all times, which is very difficult to convey to the user using prior art techniques.
All the above approaches for a multimodal browsing architecture have in common that they are not suitable for use in mobile electronic devices of terminals such as mobile phones, or handheld computers, due to low computing power, restricted resources or low battery capacity.
So it would be desirable to have a multimodal browsing system that is speech-enabled and provides superior user-friendliness.