1. Field of the Invention
The present invention relates to a method for estimating a language model weight and a system for the same and, more particularly, to a method for adaptively estimating a language model weight based on a continuous speech recognition result and a system for the same.
2. Description of the Related Art
In general, a language model plays a very important role in continuous speech recognition and is used to correct errors of an acoustic model and to represent the result. However, in the case of an N-gram language model, which is currently most widely used and considered as the standard of the speed recognition industry, the result may be very different according to the area of learning data. Moreover, when the area of speech to be recognized coincides with the area of language model learning data, the best performance can be expected. Thus, in the speech recognition, which covers many areas, such as broadcast news speech recognition, the accuracy of the speech recognition can be improved by a language model adaptation method in which the area of the language model is adapted to each subject.
A process of generating an N-best list of multiple recognition candidates for an utterance in a continuous speech recognition engine for the continuous speech recognition is as follows. Intermediate recognition results such as word lattices are output by performing a search based on a grammar network in which the connections between words are defined as a graph, and the word lattices are re-evaluated using collocation information of words, statistical language model information such as bigram and trigram probabilities, or A-Star algorithm, thereby generating the N-best list.
In the process of generating the N-best recognition candidates, the language model has a wide search space and requires a large number of calculations for recognition. Thus, a low-order language model is used in a first search to output a word lattice as a first search result, and a second search is performed by applying a higher-order language model than the language model used in the first search, a domain-specific language model, to the word lattice as the first search result.
During the second search, the beginning and end of given words in the word lattice are fixed, and only an acoustic score or language model score is newly calculated. Thus, the second search requires a smaller number of calculations than the first search. Here, when the language model is applied in the second search, as the weight on the acoustic score of the word lattice, a value determined from experiments is fixed and used. However, when the weight used in the second search is fixed, it is impossible to adaptively perform the second search based on the first search result. That is, even when the correct answer is not found in the first search as the score of the continuous speech recognition result is low, the second search is performed using the same weight, and thus the possibility of finding the correct answer is very low.