As computer applications become more and more popular, there is an increasing user expectation for direct natural language communication with computers, because natural language is the most convenient, effective, and speedy form of communication method for human beings. Speech recognition technology is a technology for changing human speech signals into corresponding text through computer-based recognition and understanding processes. Language models play an important role of improving the accuracy of speech recognition.
Due to the limitation of hardware performance and software algorithms, current speech recognition systems pose strict limits on the size of language models. Correspondingly, the size of a language model grows exponentially with increasing vocabulary size covered by the language model. Due to these two reasons, the available vocabulary size of a speech recognition system cannot be expanded indefinitely. Under the condition of current technology, the upper limit of glossary capacity of a speech recognition system is slightly over one hundred thousand. For words outside of the glossary, the recognizing accuracy of speech recognition system will decline significantly.
Moreover, there exist millions of words with low usage frequencies in the normal speech environment. For example, these may be words that are relevant only for a short time (e.g., names of TV programs or movies); words that are relevant only to a particular geographic region (e.g., names of local restaurants), or words that just appear in a certain professional field (e.g., technical terms or jargons), and so on. For these and other reasons, there exists the phenomenon that there are a large body of low-frequency words in which each word has a very low statistical significance.
Therefore, there is an urgent need for solving the problem of how to expand the vocabulary coverage of a language model without significantly increasing the size of the language model or compromising its computation accuracy.