The ability to communicate in second language is an important goal for language learners. Students working on fluency need extensive speaking opportunities to develop this skill. But students have little motivation to speak out because of their lacking of confidence due to the poor pronunciation. The intent of pronunciation assessment systems is to provide learners with diagnosis of problems and improve conversation skill. The traditional ways of computer-assisted pronunciation assessment (PA) mainly come in two approaches: text-dependent PA (TDPA) and text-independent PA (TIPA). Both approaches use the speech recognition technology to evaluate the pronunciation quality and the result is not very effective.
TDPA constrains the text for reading to pre-recorded sentences. The learner's speech input is compared to the pre-recorded speech for scoring. The scoring method usually adopts template-based speech recognition like Dynamic Time Warping (DTW). Therefore, the TDPA approach has the following disadvantages. It limits learning contents to the prepared text, requires teacher's recording for all learning contents, and is biased by teacher's timbre.
To overcome the aforementioned drawbacks of the TDPA approach, the TIPA approach usually adopts speaker-independent speech recognition technology and integrates speech statistical models to evaluate the pronunciation quality for any sentence. It allows adding new learning content. Since the statistic speech recognizer requires acoustic modeling of phonetic units like phonemes or syllables, the TIPA is language dependent. Moreover, the recognition probabilities can't all appropriately justify pronunciation goodness. As shown in FIG. 1 of speech recognition score distribution, phoneme AE ([æ]), AA ([α]), and AH ([Λ]) have very close distribution, though they sound different. Therefore, the probability scoring by speech recognition model is not representative enough to evaluate pronunciation. In addition, the TIPA approach can't provide learners with useful information to learn correct pronunciation through these probability score.