Speech recognition systems have been successfully deployed to implement voice browsers where a user utters a spoken input to the system and the system recognizes the spoken input by selecting a return value associated with the spoken input.
The selection of the return value is based on a correlation between a digitized waveform of the spoken input with a waveform of a word or string in a grammar provisioned in the system. Once the system selects the return value, the system presents the selected return value to the user for confirmation. For example, in a travel context, a voice recognition system may have the following dialogue or conversation with a user:
System: “Please state your destination city.”
User: “Boston.”
System: “I heard Boston, is that right?”
User: “Yes.”
Since the user confirmed the return value of “Boston” in the above example, the system can continue the dialogue with the user.
If the user does not confirm the selected return value, then in one system known to the inventor, the system may simply repeat the dialogue or parts of it in the hope that the user may speak more clearly thereby to facilitate correct selection of the return value. The following conversation exemplifies this approach:
System: “Please state your destination city.”
User: “Boston.”
System: “I heard Austin, is that correct?” A
User: “No, Boston.” B
System: “I heard Austin, is that correct?” C
In the above conversation, the parts labeled A, B, and C may be repeated until the user hangs up through sheer frustration or the system crashes.
In another system known to the inventor, in order to arrive at the correct return value, the system sequentially presents every possible return value provisioned in the system to the user for confirmation. With this approach, a sample conversation may take the following form:
System: “Please state your destination city.”
User: “Boston.”
System: “I heard Austin, is that correct?”
User: “No, Boston.”
System: “I heard Portland, is that correct?”
User: “No, Boston.”
System: “I heard Baltimore, is that correct?”
User: “No, Boston,” etc.
As a list of destination cities may comprise hundreds of return values, each corresponding to a city on the list, it will be appreciated that the above technique for selecting the correct return value (also known as the N-best approach) is also inadequate.