It is increasingly common that computerized systems can process natural language utterances made by humans. Web search engines, for example, accept natural language textual input, process it, and provide visual results. Such systems generally provide a large number of results, such as 10, simultaneously in a browser window. The results can include pictures and text. Such systems might also display some advertisements. Visual human machine interfaces can provide rich and varied results with much information that users can consume relatively quickly.
Speech-enabled systems are ones in which users interact with machines, providing input by speaking natural language utterances. Such machines generally use automatic speech recognition and natural language processing techniques to interpret utterances. Many speech enabled systems also output generated speech, though some do not. The rate of information transfer through speech is much lower than with visual displays. It would take a long time for a system to speak all of the results that browser-based search engines provide in response to input of a single natural language expression. It would take so long that it would be impractical for users to interact with such machines by speech alone.
Many visual systems, in response to expressions with ambiguous meaning, display results appropriate to each of multiple reasonable interpretations of the utterances. It is, in most cases, impractical for speech-enabled systems to provide appropriate results to ambiguous utterances. Conventional speech-enabled systems, when faced with an ambiguous utterance, guess at the best interpretation in order to form their result. Frequently, the guessed interpretation is not the interpretation that the user intended. This is a common cause of frustration for users of conventional speech-enabled systems.
In addition, the use of visual displays for disambiguating the meaning of a speech utterance is not practical in many situations such as for a device that has no display or a device that must be operated without needing eye contact.