Field of the Invention
The invention relates to a speech recognition technique, and more particularly, relates to a method for building a language model, a speech recognition method for recognizing speeches of different languages, dialects or pronunciation habits and an electronic apparatus thereof.
Description of Related Art
Speech recognition is no doubt a popular research and business topic. Generally, speech recognition is to extract feature parameters from an inputted speech and then compare the feature parameters with samples in the database to find and extract the sample that has less dissimilarity with respect to the inputted speech.
One common method is to collect speech corpus (e.g. recorded human speeches) and manually mark the speech corpus (i.e. annotating each speech with a corresponding text), and then use the corpus to train an acoustic model and an acoustic lexicon. Therein, the acoustic model and the acoustic lexicon are trained by utilizing a plurality of speech corpuses corresponding to a plurality of vocabularies and a plurality of phonetic transcriptions of the vocabularies marked in a dictionary. Accordingly, data of the speech corpuses corresponding to the phonetic transcriptions may be obtained from the acoustic model and the acoustic lexicon.
However, the current method faces the following problems. Problem 1: in case the phonetic transcriptions of vocabularies used for training the acoustic model is the phonetic transcriptions marked in the dictionary, if nonstandard pronunciation (e.g. unclear retroflex, unclear front and back nasals, etc.) of a user is inputted to the acoustic model, fuzziness of the acoustic model may increase since the nonstandard pronunciation is likely to be mismatched with the phonetic transcriptions marked in the dictionary. For example, in order to cope with the nonstandard pronunciation, the acoustic model may output “ing” that has higher probability for a phonetic spelling “in”, which leads to increase of an overall error rate. Problem 2: due to different pronunciation habits in different regions, the nonstandard pronunciation may vary, which further increases fuzziness of the acoustic model and reduces recognition accuracy. Problem 3: dialects (e.g. standard Mandarin, Shanghainese, Cantonese, Minnan, etc.) cannot be recognized. Problem 4: mispronounce words (e.g., “” in “-” should be pronounced as “hé”, yet many people mispronounce it as “hé”) cannot be recognized. Problem 5: because phonetic transcriptions are converted into vocabularies by the acoustic lexicon, a lot of speech information (e.g., accent locations) may lose to influence an accuracy in intention recognition, which leads to increase of an error rate in semanteme recognition.