Conventional speech recognition systems rely on various statistical models trained from very large amounts of data. The so-called acoustic model is typically trained using extensive quantities of manually transcribed speech audio. The accuracy of speaker-specific speech recognition accuracy is substantially dependent on the extent to which characteristics of the speaker's voice are represented in the acoustic model training data. Accordingly, a single acoustic model trained on a finite amount of data will perform poorly for speakers whose voice characteristics are not adequately represented in the acoustic model training data.