At present, it is common to communicate through mixed languages (or multiple different languages) with the globalization. According to statistics, people who speak multiple languages are greater than those speak a single language. The complexity of acoustics and language between the mixed languages presents a challenge to the speech recognition. Therefore, the study on an acoustic model of the mixed languages becomes an important research aspect.
The speech recognition technology of the mixed languages refers to training the acoustic model of the mixed languages using a Chinese-English mixed dictionary, to obtain a speech recognition model. At present, the Chinese-English mixed dictionary is obtained by obtaining a Chinese dictionary including phoneme sets marked by initial consonants and simple or compound vowels of Chinese syllables, and adding some English words into the Chinese dictionary in a manner of marking the initial consonants and the simple or compound vowels of Chinese syllables. However, the marking of English words is incomplete, time-consuming and manual. Further, the acoustic model of the mixed languages may be for example a deep neural network (DNN for short) acoustic model, a deep convolutional neural network (CNN for short) acoustic model, a long short-term memory (LSTM for short) acoustic model, etc., and its accuracy is not high enough.