In a known example for building automotive speech recognition systems, speakers or training subjects are grouped based on gender, dialect, or accent. The speakers' utterances are collected over time into a training data set, and an acoustic model for an automotive speech recognition system is trained using the collected training data set. This method assumes that speakers annunciate at a normalized rate and pitch. A recognition system utilizing the model takes as input a signal comprised of acoustic energy emanating from a speaker as an utterance and determines the probability of a match of the signal to a word or a phrase, or set of words or phrases. Speaker utterances are collected in an environment with little or no background noise, resulting in a noise-free data set to develop or train an acoustic model.
In an automotive environment, users of automotive speech recognition systems speak in different ways depending on the background noise within a vehicle interior. Many users increase utterance volume and pitch as vehicle interior background noise increases in intensity. Vehicle interior background noise is influenced by, for example, the type of road the vehicle is traversing, the speed the vehicle travels, wind noise, noise external to the vehicle, HVAC settings, and other factors. Variations in user utterance volume and pitch coupled with varying external and internal background noise may represent challenges in associating the acoustic model with the user and obtaining recognition of the utterances.