With the proliferation of speech recognition technologies from controlling appliances to Interactive Voice Response (IVR) systems, speed and accuracy of speech recognition applications are important design parameters. One of the challenges in speech recognition is the diversity of received audio. Depending on a language, dialect, gender of the speaker, the recording environment, and comparable characteristics, received audio may vary significantly. System developers employ one or more algorithms to detect different characteristics of the audio received from a speaker in conjunction with training models like acoustic models, language models, and similar ones.
Many algorithms for improving automatic speech recognition accuracy require some amount of speech from the speaker to be effective. For example, accuracy can be improved by detecting the dialect of the speaker and modify the system accordingly. Such algorithms often require recognition results from the same speaker as input. Thus, these algorithms typically cannot be applied to the first or first few utterances that the system is exposed to. Therefore automatic speech recognition accuracy is initially lower since the enhancing algorithms cannot be used effectively.