In language learning systems, non-native speakers are asked to pronounce phrases the system prompts or plays first, i.e. in parroting mode. Some speakers pronounce words and phrases more accurately than others, some of a speaker's attempts might be better than others. In order to rate an attempt's pronunciation validity and pronunciation quality (with some modifications regarding thresholding and calibration) it is important to compare a speaker's utterance to target expected words or phrases. This is sometimes called utterance verification.
A phoneme is a group of slightly different sounds which are all perceived to have the same function by speakers of the language or dialect in question.
Various algorithms regarding utterance verification have been developed. There are two broad categories of techniques: Confidence measures and utterance verification. Confidence measures assign a probability to an utterance based on language modeling and acoustic evidence. Examples of such systems can be found in T. J. Hazen and I. Bazzi, A comparison and combination of methods for OOV word detection and word confidence scoring, IEEE International Conference on Acoustics, Speech and Signal Processing (“ICASSP”), 2001 and E. Tsiporkova, F. Vanpoucke, H. Van Hamme, Evaluation of various confidence-based strategies for isolated word rejection, IEEE-ICASSP, 2000.
Utterance verification methods not using confidence scores use in-grammar and out-of-grammar phrases based on which a likelihood test ratio is computed. See, e.g., T. Kawahara, Chin-Hui Lee, Biing-Hwang Juang, Combining Key-Phrase Detection and Subword-based Verification for Flexible Speech Understanding, IEEE-ICASSP, 1997. Phoneme-to-word transduction for speech recognition is, for example, presented in G. Zweig and J. Nedel. Empirical Properties of Multilingual Phone-to-Word Transduction, IEEE-ICASSP, 2008 and C. White, G. Zweig, L. Burget, P. Schwarz, H. Hermansky, Confidence Estimation, OOV Detection and Language ID Using Phone-to-Word Transduction and Phone-Level Alignments, ICASSP, 2008.