The vast territory of China boosts a variety of dialects containing regional accents in Chinese language. The dialects in different regions vary in pronunciation characteristics and speech rate characteristics. Therefore, when speech recognition is performed on the dialects in different regions with the same acoustic model, it will arise that the recognition accuracy is not high. In order to solve the low recognition accuracy problem when performing speech recognition on different regional accents with the same acoustic model applicable to Chinese Mandarin, it is a good approach to train, for dialects in different regions, acoustic models customized for various dialects.
Mass training data are needed to train an acoustic model. Currently, with the ubiquity of instant messaging tools such as WeChat, MiTalk, a considerable amount of primary speech data are available on the Internet. These speech data may act as training data for training acoustic models for dialects in different regions. However, in the prior art, there is no automated method for distinguishing, among these speech data, which are speech data in Chinese Mandarin, and which are regional speech data, such that before training acoustic models for regional accents using the primary speech data, it is first required to manually label the primary speech data with regional tags, which will consume a considerable amount of personnel and material resources.