Automated transcription of audio data uses at least one model, that when applied to the audio data, to interpret the audio data into phonemes or words. Models can be acoustic models that match particular words, letters, or phonemes to the signals in audio data that correspond to these structures. Models may further be linguistic models that include a dictionary of words combined with statistics on the frequency of expected occurrences of the words in the dictionaries. Acoustic and/or linguistic models may vary depending upon a particular field of localized setting. Such settings may be based upon a specialized field such as technology, medicine, or law, or may be a geographic location or region.
Currently, the creation of these locally adaptive models is expensive and time consuming as these models rely upon manual transcriptions in order to ensure that the transcription is correct and then these manually transcribed customer service interactions can be extrapolated into adapted models.