Speech recognition includes processes for converting spoken words to text or other data. In general speech recognition systems translate verbal utterances into a series of computer-readable sounds and compare those sounds to known words. For example, a microphone may accept an analog signal, which is converted into a digital form that is then divided into smaller segments. The digital segments can be compared to the smallest elements of a spoken language, called phonemes (or “phones”). Based on this comparison, and an analysis of the context in which those sounds were uttered, the system is able to recognize the speech.
To this end, a typical speech recognition system may include an acoustic model, a language model, and a dictionary. Briefly, an acoustic model includes digital representations of individual sounds that are combinable to produce a collection of words, phrases, etc. A language model assigns a probability that a sequence of words will occur together in a particular sentence or phrase. A dictionary transforms sound sequences into words that can be understood by the language model.