Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. In a typical speech recognition system, the acoustic signal is converted into a digitized speech signal and then segmented into a set of speech segments. Each set of speech segments contains useful measurements or features known as phonemes. Phonemes are the smallest sound units of which words are composed of. The phonemes are then represented by using a phonetic language model such as a 2-phoneme or 3-phoneme hidden Markov model (HMM). The HMM captures and represents patterns of variation of the phonemes into phoneme groups. The phoneme groups are then applied to a language model such as a 2-gram or 3-gram HMM, which is used to recognize the most probable words for each group and then transcribe the words. A majority of the transcription errors in a speech recognition system are due to the underlying language models and the specific speech patterns of a speaker or group of speakers. In order to reduce the amount of errors, many advanced speech recognition systems utilize trainable language models that can be optimized for a particular speaker as well as for a specific sub-language usage (e.g., the field of radiology). However, even with optimization, these speech recognition systems cannot guarantee consistent, high accuracy performance, because of the limited capabilities of their underlying language models. Therefore, there is a need to be able to compensate for the limited capabilities of the language models in a speech recognition system in order to provide consistent, high accuracy speech recognition.