The present disclosure relates to speech recognition and, more specifically, to adjusting the settings of automatic speech to text engines in real time.
Automatic speech recognition is a method of converting an audio signal, such as spoken language, that has been received by a computer or system to text. This conversion can employ speech-to-text engines that use algorithms implemented through computer programs to automatically generate a sequence of words based on the audio signal. One use of speech-to-text engines is in allowing people to communicate with digital devices, such as smartphones and digital assistants. Speech-to-text engines can also aid those with disabilities such as hearing loss by automatically captioning speech in classrooms, films, phone calls, etc. Another use of speech-to-text engines is in recording telephone calls so that information can be gleaned from their contents. Speech-to-text engines can vary, and different engines may be appropriate in different contexts. For example, speech-to-text engines can be trained to recognize particular languages, accents, and vocabularies. Additionally, some speech-to-text engines may have different speeds and levels of accuracy. Variations in engines can result in variations in the amount of processing power required to convert the speech to text.