Advances in automatic speech recognition technology are providing more rich and meaningful dialogs between humans and machines in a rapidly-increasing number of applications. Many applications seek to allow rich, meaningful, “open” dialogs in an effort to make dialogs more efficient. “Open” dialogs are dialogs in which the recognition system does not strictly limit what the speaker may say. Open dialogs, however, can be lengthy, tedious and error-prone, due at least in part to imperfect speech recognition accuracy.
Poor recognition accuracy in open dialogs can result from a variety of factors. One common factor is the fact that speakers typically convey information to the recognition system over a lossy speech channel, such as the public switched telephone network. Recognition accuracy also tends to depend on the quality with which expected utterances are modeled; yet speaker utterances can be difficult to predict, especially in applications that are large and open. Further, modeling for open dialogs typically requires a massive amount of training data across a very large number of speakers. As a result, the training process can be difficult and costly.
On the other hand, some applications have sought to simplify dialogs. For example, certain voice portals have provided the ability to derive very simple grammars from an address book. Typically, these grammars are constrained to people's names and addresses for use in voice-activated dialing. These applications generally are not very powerful and are limited in their applicability.