The present invention relates to language modeling. More particularly, the present invention relates to creating a language model for a language processing system.
Accurate speech recognition requires more than just an acoustic model to select the correct word spoken by the user. In other words, if a speech recognizer must choose or determine which word has been spoken, if all words have the same likelihood of being spoken, the speech recognizer will typically perform unsatisfactorily. A language model provides a method or means of specifying which sequences of words in the vocabulary are possible, or in general provides information about the likelihood of various word sequences.
Speech recognition is often considered to be a form of top-down language processing. Two common forms of language processing includes “top-down” and “bottom-up”. Top-down language processing begins with the largest unit of language to be recognized, such as a sentence, and processes it by classifying it into smaller units, such as phrases, which in turn, are classified into yet smaller units, such as words. In contrast, bottom-up language processing begins with words and builds therefrom, larger phrases and/or sentences. Both forms of language processing can benefit from a language model.
One common technique of classifying is to use a formal grammar. The formal grammar defines the sequence of words that the application will allow. One particular type of grammar is known as a “context-free grammar” (CFG), which allows a language to be specified based on language structure or semantically. The CFG is not only powerful enough to describe most of the structure in spoken language, but also restrictive enough to have efficient parsers. Nevertheless, while the CFG provides us with a deeper structure, it is still inappropriate for robust spoken language processing since the grammar is almost always incomplete. A CFG-based system is only good when you know what sentences to speak, which diminishes the value and usability of the system. The advantage of a CFG's structured analysis is thus nullified by the poor coverage in most real applications. For application developers, a CFG is also often highly labor-intensive to create.
A second form of a language model is an N-gram model. Because the N-gram can be trained with a large amount of data, the n-word dependency can often accommodate both syntactic and semantic shallow structure seamlessly. However, a prerequisite of this approach is that we must have a sufficient amount of training data. The problem for N-gram models is that a lot of data is needed and the model may not be specific enough for the desired application. Since a word-based N-gram model is limited to n-word dependency, it cannot include longer-distance constraints in the language whereas CFG can.
A unified language model (comprising a combination of an N-gram and a CFG) has also been advanced. The unified language model has the potential of overcoming the weaknesses of both the word N-gram & CFG language models. However, there is no clear way to leverage domain-independent training corpus or domain-independent language models, including the unified language models, for domain specific applications.
There thus is a continuing need to develop new methods for creating language models. As technology advances and speech and handwriting recognition is provided in more applications, the application developer must be provided with an efficient method in which an appropriate language model can be created for the selected application.