Field of the Invention
The invention relates to a speech recognition technique, and more particularly, relates to a speech recognition method for recognizing speeches of different languages, dialects or pronunciation habits and an electronic apparatus thereof.
Description of Related Art
Speech recognition is no doubt a popular research and business topic. Generally, speech recognition is to extract feature parameters from an inputted speech and then compare the feature parameters with samples in the database to find and extract the sample that has less dissimilarity with respect to the inputted speech.
One common method is to collect speech corpus (e.g. recorded human speeches) and manually mark the speech corpus (i.e. annotating each speech with a corresponding text), and then use the corpus to train an acoustic model and an acoustic lexicon. Therein, the acoustic model and the acoustic lexicon are trained by utilizing a plurality of speech corpuses corresponding to a plurality of vocabularies and a plurality of pronunciations of the vocabularies marked in a dictionary.
However, the current method faces the following problems. Problem 1: in case pronunciations of vocabularies used for training the acoustic model is the pronunciations marked in the dictionary, if nonstandard pronunciation (e.g. unclear retroflex, unclear front and back nasals, etc.) of a user is inputted to the acoustic model, fuzziness of the acoustic model may increase since the nonstandard pronunciation is likely to be mismatched with the pronunciations marked in the dictionary. For example, in order to cope with the nonstandard pronunciation, the acoustic model may output “ing” that has higher probability for a phonetic spelling “in”, which leads to increase of an overall error rate. Problem 2: due to different pronunciation habits in different regions, the nonstandard pronunciation may vary, which further increases fuzziness of the acoustic model and reduces recognition accuracy. Problem 3: dialects (e.g. Shanghainese, Cantonese, Minnan, etc.) cannot be recognized. Problem 4: mispronounce words (e.g., “” in “” should be pronounced as “hé”, yet many people mispronounce it as “hè”) cannot be recognized. Problem 5: because phonetic transcriptions are converted into vocabularies by the acoustic lexicon, a lot of speech information (e.g., accent locations, an original pronunciation of a polyphone) may lose to influence an accuracy in intension recognition, which leads to increase of an error rate in semanteme recognition.