1. Field of Invention
This invention relates to the selection of superwords and meaningful phrases based on a criterion relevant to both speech recognition and understanding.
2. Description of Related Art
Currently, there are applications of speech recognition that provide a methodology for automated task selection where the targeted task is recognized in the natural speech of a user making such a selection. A fundamental aspect of this method is a determination of a set of meaningful phrases. Such meaningful phrases are determined by a grammatical inference algorithm which operates on a predetermined corpus of speech utterances, each such utterance being associated with a specific task objective, and wherein each utterance is marked with its associated task objective.
The above features are addressed in U.S. patent application Ser. No. 08/528,577, "Automated Phrase Generation", and U.S. Pat. No. 5,675,707, "Automated Call Routing System", both filed on Sep. 15, 1995, which are incorporated herein by reference.
The determination of the meaningful phrases used in the above application is founded in the concept of combining a measure of commonality of words and/or structure within the language--i.e., how often groupings of things co-occur--with a measure of significance to a defined task for such a grouping. That commonality measure within the language can be manifested as the mutual information in n-grams derived from a database of training speech utterances and the measure of usefulness to a task is manifested as a salience measure.
Mutual information ("MI"), which measures the likelihood of co-occurrence for two or more words, involves only the language itself. For example, given War and Peace in the original Russian, one could compute the mutual information for all the possible pairings of words in that text without ever understanding a word of the language in which it is written. In contrast, computing salience involves both the language and its extra-linguistic associations to a device's environment. Through the use of such a combination of MI and a salience factor, meaningful phrases are selected which have both a positive MI (indicating relative strong association among the words comprising the phrase) and a high salience value.
Such methods are based upon the probability that separate sets of salient words occur in the particular input utterance. For example, the salient phrases "made a long distance" would be determined as a meaningful phrases by that grammatical inference algorithm based on their individual mutual information and salience values.
In addition, while the task goal involves recognizing meaningful words and phrases, this is typically accomplished via a large vocabulary recognizer, constrained by stochastic language models, such as an n-gram model. One approach to such modeling to constrain the recognizer is to train a stochastic finite state grammar represented by a Variable Ngram Stochastic Automaton (VNSA). A VNSA is a non-deterministic automaton that allows for parsing any possible sequence of words drawn from a given vocabulary.
Traditionally, such n-gram language models for speech recognition assume words as the basic lexical unit. The order of a VNSA network is the maximum number of words that can be predicted as occurring after the occurrence of a particular word in an utterance. Thus, using conditional probabilities, VNSAs have been used to approximate standard n-gram language models yielding similar performance to standard bigram and trigram models. However, when the "n" in the n-gram becomes large, a database for predicting the occurrence of words in response to the appearance of a word in an utterance, becomes large and unmanageable. In addition, the occurrence of words which are not strongly recurrent in the language may be assigned mistakenly high probabilities, and thus generating a number of misdetections in recognized speech.
Thus, a method to create longer units for language modeling is needed in order to promote the efficient use of n-gram language models for speech recognition and for using these longer units along with meaningful words and phrases for language recognition and understanding.