In day to day telephone/mobile conversations, a listener in a conversation may often ask a speaker to repeat certain portions of their speech due to the listener's inability to understand the certain portions of speech. Such a situation happens more often in the presence of background noise where the intelligibility of speech is affected significantly. Speech recognition systems, devices, and methods can utilize such repeated information, especially in the presence of heavy/bursty background noise, to better discern speech for various applications.
Some speech recognition systems, such as Automatic Speech Recognition (ASR) systems, work well when test and training conditions are comparable. An example of an ASR system may be the speech recognition system used in an automated call center for an airline. Many speech recognition systems, including ASR systems, store training data that includes data representing the most likely used parts of speech. Training data is unaffected by ambient noise, different speaker accents, or any other negative audio effects on the speech data. However, real world testing environments are different than training conditions. Various factors like additive noise, acoustic echo, and speaker accent may affect speech recognition performance in many real world test environments. Since ASR can be characterized as a statistical pattern recognition problem, if the test patterns are unlike anything used to train the models, then errors may occur. Various approaches to increase robustness in ARS technology have been proposed that include: (i) reducing the variability of the model or (ii) modifying the statistical model parameters to suit the noisy condition. However, under very high noise conditions or bursty error channels, such as in packet communication where packets may be dropped, speech recognition systems may benefit from taking the approach of using repeated utterances to accurately decode speech.