During the past few years, there has been significant interest in developing new computer based techniques in the area of language learning. An area of significant growth has been the use of multimedia (audio, image, and video) for language learning. These approaches have mainly focused on the language comprehension aspects. In these approaches, proficiency in pronunciation is achieved through practice and self-evaluation.
Typical pronunciation scoring algorithms are based upon the phonetic segmentation of a user's speech that identifies the begin and end time of each phoneme as determined by an automatic speech recognition system.
Unfortunately, present computer based techniques do not provide sufficiently accurate scoring of several parameters useful or necessary in determining student progress. Additionally, techniques that might provide more accurate results tend to be computationally expensive in terms of processing power and cost. Other existing scoring techniques require the construction of large non-native speakers databases such that non-native students are scored in a manner that compensates for accents.
Additionally, present computer based techniques do not provide feedback in a manner that allows a user to improve his or her speech. Namely, it is often difficult for a student to identify specific speech problems, e.g., improper relative word duration and intonation, that must be addressed to improve his or her speech. More specifically, once specific speech problems are identified, it will be beneficial to provide feedback in a manner that allows a user to visualize a comparison between his pronunciation against a reference pronunciation.