In some application scenarios of speech recognition, the contents of speech usually include some key words. It is highly related to the user feeling of speech recognition system that the key words can be correctly recognized. For example, in a meeting assistant application in which speech recognition system recognizes the contents of the meeting speakers, the important person name, place name, technical terminologies and etc. involved in the meeting are key words. The recognition accuracy of key words is the most important performance indicator of such applications.
The key words can by classified into two categories. If a key word is not included in the system lexicon of the speech recognition system, it can be called as new word, and if another key word is included in the system lexicon, it can be called as core word. Because the new words are not registered in the system lexicon and the speech recognition system can only output the words that are existed in the system lexicon, the new words completely cannot be directly recognized.
Although most of the new words can be constructed by the existing words in the system lexicon and then output by recognizer indirectly, the example of such construction of new words are very rare in the training corpus, which cause very low language model (LM) scores of new words, so the probability of successful recognition of news words in this way is very small.
For the core words that are registered in the system lexicon, if other non-core word existing in the system lexicon with identical or similar pronunciation but higher LM score, the speech recognition system tends to wrongly select such non-core word as output result.
So, the recognition accuracy of key words for normal speech recognition system is usually very low, and the mis-recognized results are those words with identical or similar pronunciation to key words.
To improve the recognition accuracy of key words, it is a very critical point to increase the LM scores of key words. Generally, a class-based LM is used for such purpose. In this method, classes corresponding to some key word types can be built, e.g. class of person name, class of place name, class of technical terminology and etc., and a certain number of representative words that have the attribute of a class are selected from the system lexicon and added to the class. In LM training, the LM score of each class is calculated based on the statistics of all the representative words contained in that class. Before recognition, the key words are registered to the system lexicon and linked to the most suitable class. In recognition stage, a key word shares the LM scores of the class it belongs to. Because the LM scores of representative words are very high, the LM scores of key words are greatly increased and the recognition accuracy is efficiently improved consequently.