Speech models, such as environment models, language models and acoustic models may be used to facilitate speech recognition. For example, an environment model may indicate the noise characteristics of an operating environment. An acoustic model may be used to identify phonemes or other speech units present in an utterance. A language model may be used to convert the phonemes or other speech units identified by the acoustic model into words, phrases, and the like. Generally, people speak (and speech recognition devices operate) in a wide variety of environments and with a wide variety of speech patterns. For example, people speak with varying frequencies and/or vocal tract lengths. In addition, the environment in which a person speaks into a speech recognition device may introduce noise, such as an echo or wind. The acoustic models may be modeled to take into account particular vocal frequencies, vocal tract lengths, environmental noise, and the like. However, because the way people speak and the environment in which they are speaking may vary greatly, some captured audio data may not correspond correctly to the speech models.
In some cases, audio data captured by such a speech recognition device may be adjusted, for example via normalization or utilizing speech-related statistics, so that it corresponds more closely to the speech models. However, calculating such statistics may require a significant amount of audio data. In addition, each time a particular person speaks, the captured audio data may need to be adjusted in a similar manner. It is sometimes desired to produce speech recognition results as quickly as possible; however, having to calculate statistics in order to adjust the captured audio may increase the time needed to perform the speech recognition. Such calculations can lead to unacceptable delays by a speech recognition device.