Learning to correctly pronounce a non-native language is a challenging problem. The classic solution to this problem is to have a teacher who is a native speaker of the foreign language, who provides detailed feedback regarding mispronunciation errors and how to correct such errors. While detection of mispronunciation is a relatively easy task for a native speaker, experience shows that this can be quite difficult for a non-native speaker. The reason is very simple—if a particular sound does not exist in a speaker's native language, the speaker will have difficulty distinguishing between correct and incorrect pronunciation. Even more difficult is the task of knowing what to change in the way one pronounces (or rather mispronounces) a sound to make it right, and it is here that a teacher's feedback is critical.
With the creation of automatic speech recognition (“ASR”) systems the task of having feedback provided by a machine rather than from a human teacher became somewhat attainable. The challenge of creating such a machine is that the main focus of the ASR systems was and is on recognizing what is said rather than detecting what is incorrectly pronounced. Thus, instead of emphasizing pronunciation errors, previously-known ASR systems attempt to decipher what was said, even if an utterance was pronounced incorrectly. One method to overcome the inability of previously-known ASR systems to handle mispronunciation involves trying to isolate poorly pronounced sounds in the utterance using a Viterbi algorithm for wave segmentation based on recognition scores. This approach was used by Maxine Eskenazi in her work in early 2000's and is described in her U.S. Pat. No. 7,752,045. That patent compares acoustic features of a mispronounced sound with the acoustic features from a pre-recorded database of native speakers' utterances. A similar approach is proposed in U.S. Patent Application Publication No. 2009/0305203 A1, which describes detailed acoustic analysis of segments for individual phonemes to determine if they were mispronounced.
One drawback of the methodology employed in the foregoing patent and application is that they rely on the segmentation of the utterance based on the best score recognition, which is not reliable—especially for a non-native speaker. Due to the nature of the Viterbi algorithm, it cannot recover from errors in segmentation in cases of serious mispronunciation and instead produces as a result an incorrect phoneme alignment. These errors in turn produce unreliable feedback to the speaker, even if the acoustic features are perfectly extracted and classified, which by itself is a very challenging task.
In view of the shortcomings of the prior art, it would be desirable to develop a new approach to detecting pronunciation errors that is less prone to segmentation errors resulting from mispronounced utterances.
It further would be desirable to provide a system and methods for improving pronunciation of a non-native language that takes advantage of publicly accessible, robust third party ASR systems, especially the Google Voice system developed by Google, Inc.
It still further would be desirable to provide a system and methods for learning to speak a non-native language that does not require studying how to pronounce each word and practicing to pronounce each word in the dictionary, but rather enables the student to acquire knowledge and skills to pronounce properly phonemes and sequences of phones like triphones in real time.
It also would be desirable to provide a system and methods for improving pronunciation of a non-native language that monitors the response of publicly accessible third party ASR systems to mispronunciations of a representative set of words (e.g., that covers all phonemes and triphones) and provides automatic feedback to assist users to correct mispronunciation errors.
It further would be desirable to provide a system and methods for improving pronunciation of a non-native language that enables a user to invoke and use the system in real-time situations using previously-known mobile devices, such as cell phones.