Speech recognition refers to the process of converting spoken words to text. Speech recognition systems translate verbal utterances into a series of computer readable sounds which are compared to known words. For example, a microphone may accept an analog signal, which is converted into a digital form that is divided into smaller segments. The digital segments can be compared to the smallest elements of a spoken language, called phonemes (or “phones”). Based on this comparison, the speech recognition system can identify words by analyzing the sequence of the identified sounds to determine, for example, corresponding textual information.
A speech recognition system uses an acoustic model, a dictionary, and a language model to recognize speech. In general, an acoustic model includes digital representation of individual sounds that are combinable to produce a vast collection of words, phrases, etc. A language model assigns a probability that a sequence of words will occur together in a particular sentence or phrase. A dictionary identifies words in the input speech.
In general, building a language model includes obtaining a vocabulary and training data. The training data may include a corpus of data that reflects use of the language, e.g., documents, transcripts, e-mail, academic papers, novels, etc.