Though the field of automatic speech recognition (ASR) has seen many developments in recent years, the quality of filly automated transcription is often not adequate for many applications in which accuracy is of the utmost importance, such as transcription of legal depositions. One problem that ASR systems need to deal with is the vast range of accents that they may encounter. Accommodating many accents can make transcription a more difficult task. since with different accents, the same word may be pronounced in different ways, which can greatly increase the lexical search-space the ASR system needs to tackle when trying to transcribe an utterance.
Knowing the accent being spoken in audio can assist ASR systems to make adjustments that improve the quality of transcription. However, misidentification of the spoken accent can lead to use of an inappropriate model by an ASR system, which may decrease transcription accuracy. Machine learning classifiers are not always successful at identifying accents of speakers, so using a fully automated approach to account for accents by an ASR system can actually be detrimental when the accent is not identified correctly. Thus, there is a need for a way to provide an ASR system with an accurate identification of the accent being spoken in an audio recording being transcribed.