The ability of computer systems to process and recognize speech has vastly improved with the progress of technology. These improvements have given rise to new areas of speech processing technology, which are being utilized in various fields today. Language models play a very important role in speech processing systems. There are two common types of language models that are used often today. One is a rule-based language model (RLM) and the other is a statistics-based language model (SLM).
RLM uses general linguistic or domain knowledge (i.e. Syntactic or semantic knowledge) to create grammar rules. These rules are used for governing natural language processing in a speech processing system. The disadvantage of RLM is that it works well only in dosed environment. Another disadvantage of using a rule-based system is that the created rules are often not complete enough to cover all circumstances when the system works in an open environment. Due to lack of complete knowledge, the rule-based system lacks the ability to perform accurately and with precision. Another disadvantage of a rule-based system is when a large amount of rules are used decoding speed slows down drastically and creates a fatal situation during real-time system implementation.
An SLM uses a large amount of text to determine its model parameters automatically. The model parameters govern natural language processing or speech recognition in an SLM based system. The SLM can be trained more easily and can decode at a faster speed. However, a disadvantage of the SLM is that it lacks in quality and accuracy since it depends on a corpus to train the SLM. A corpus is a data set collected from real-world applications. For example, text from a newspaper is considered as a text corpus. Therefore, a statistical language model requires a huge corpus with a very large coverage to perform at sufficient levels. In practice, the large corpus and coverage requirements impose great limitations a system, especially in narrow-domain dialogue system. Thus, statistical language models lacks in accuracy and performance.