Automatic speech recognition technology has undergone rapid advancement in recent years and is finding widespread use in many different applications. One application in which automatic speech recognition is of particular interest is “call handling”. Two examples of call handling applications are automated directory assistance and call steering (or call routing). Automated directory assistance and call steering functions are being used by businesses more and more commonly to handle incoming telephone calls. An automated directory assistance application may receive a spoken request from a telephone caller for a “destination”, such as a telephone listing (telephone number), recognize the caller's speech to identify the requested destination, and provide the requested information to the caller using recorded or synthesized speech.
Such a system might be implemented, for example, in a call center associated with a public switched telephone network (PSTN). A call steering system may be similar, except that it can automatically route a call to a spoken destination, rather than merely responding with information. For example, a call steering system can be used to connect a conventional telephone call, or to route a caller through a hierarchical structure of voice-responsive content, such as a “voice web”. Of course, automated directory assistance and call steering functions may also be combined in a given system.
Call steering and directory assistance applications both operate generally by mapping an incoming utterance (request) to one of many possible destinations. The mapping between the incoming utterance and the desired destination is established by a combination of a speech recognition engine and a mapping engine. The speech recognition engine uses a language model to recognize a caller's speech. The language model may be a speech recognition grammar, for example, which is a data representation of the usable vocabulary and syntax for the set of destinations. As another example, the language model may be a statistical language model. A statistical language model typically includes a larger vocabulary than a grammar but does not include syntax information. Rather than requiring specific word strings to be detected for recognition, a statistical language model includes probabilities of occurrence for each possible sequence of words in the vocabulary. In general, the sequence of words with the highest probability for a particular input is taken as the recognition result.
The mapping engine maps an input string of words output by the speech recognizer to one of many possible destinations. A mapping engine may use a grammar that specifies all possible word strings for each destination. Alternatively, the destination map may be statistical in nature. Creating an appropriate destination map for an automated directory assistance or call steering application can be tedious work. One has to consider as many ways as possible that any given destination may be referred to by a caller. This process tends to be labor-intensive and time-consuming, adding to the overall cost of the system.