1. Field of Invention
The present invention relates to a method of constructing a model of recognizing English pronunciations, and more particularly to a method of constructing a model of recognizing English pronunciation variations.
2. Related Art
The first language of each country is a kind of common language among all ethnic groups, which is the one selected from the languages of the ethnic groups or regions in this country, so as to facilitate communication among the ethnic groups in this country. It is also feasible among countries.
Currently, English is the popular universal language, and in order to enable the public to know its pronunciations, the corresponding phonetic alphabets are used, such as KK phonetic alphabet (created by John Samuel Kenyon and Thomas A. Knott in the United States), DJ phonetic alphabet (created by Daniel Jones in U.K.), or the International Phonetic Alphabet (IPA) which are popular all over the world. However, living products are gradually computerized currently, and a voice recognition model is usually adopted to activate a product. Therefore, people pay more attention to the voice recognition technology.
In order to achieve the voice recognition technology, pronunciations of the spoken English expressions (sentences, phrases, words, and letters) by using the IPA are recorded and then collected, and finally compiled into a corpus. A pronunciation lexicon, such as a CMU pronunciation lexicon compiled by the Carnegie Mellon University (CMU) and containing about 120,000 expressions, records English expressions and the corresponding IPAs, in which each phonetic alphabet corresponds to a sound characteristic value.
When any English voice recognition system utilizes the CMU pronunciation lexicon, the system converts the pronunciation of an English expression into a corresponding sound characteristic value, and compares this sound characteristic value with the sound characteristic value recorded in the CMU pronunciation lexicon, so as to obtain the corresponding English expression.
However, the prior art has the unavoidable defects.
Firstly, when the native language of a speaker is not English, i.e., the speaker is not from a British/American English speaking country, his/her English pronunciations are mostly influenced by intonations or pronunciation habits of the native language. For example, FIGS. 1A to 1C show incorrect English pronunciations of Taiwanese under the influence of mandarin, i.e., the pronunciation variations cannot be found in the IPAs. However, the current voice recognition system usually adopts the pronunciation lexicon formed of standard American/British English samples. Therefore, if the parsed sound characteristic value cannot be found in the pronunciation lexicon, the correct English expressions cannot be parsed correctly.
Secondary, the conventional voice recognition technology predefines all possible pronunciations (including true pronunciations and assumptive pronunciations), and only the pronunciation variations appearing in the corpus are defined in the pronunciation lexicon, for example, for the English letter A, the phonetic alphabet thereof and the sound characteristic values of the possible pronunciation variations are collected. The pronunciations not included in the corpus and pronunciations in a non-English speaking region such as fifty Japanese phonetic alphabets, thirty-seven Chinese phonetic alphabets will not be defined, so the range of the pronunciations that can be parsed is too narrow.