Speech recognition refers to a process during which speech signals are converted into words by a speech decoder. The speech decoder is also called as a speech decoding network usually consisting of an acoustic model and a language model. The acoustic model and the language model respectively correspond to the calculation of speech-to-syllable probability and syllable-to-word probability. The acoustic model and the language model are both obtained by training by using a great number of linguistic data, and further modeling.
Speech recognition for recognizing 0 to 9 digits is also called digital speech recognition. Digital speech recognition can be realized by two types of methods: one type is the adoption of an isolated word recognition technology to recognize digits of speech; the other type is the adoption of a universal continuous speech recognition technology to recognize digits in speech.
In digital speech recognition based on the isolated word recognition technology, it is required that a clear interval exists among digits when digital speech is input.
Therefore, the universal continuous speech recognition technology is used more often for digital speech recognition. The universal continuous speech recognition technology not only recognizes digits, but also recognizes other language content.