Speech technology has changed the way we live and work in recent years. Speech recognition employs speeches as research objects and allows a machine to automatically recognize and understand human oral language through speech signal processing and pattern recognition. The speech recognition is a convenient way of human-computer interaction, and now widely used in mobile internet and other fields, such as signal processing, pattern recognition, probability theory and information theory, sounding and auditory mechanism, artificial intelligence and the like. The speech recognition technology is a technology that allows the machine to translate speech signals into corresponding text or commands through recognition and understanding.
In the speech recognition technology, an accuracy of an acoustic model determines a correctness and effectiveness of the speech recognition. It requires a large amount of high quality speech data with annotation to train the acoustic model for the speech recognition. The more the data is, the higher accuracy of the trained acoustic model is. However, it is very time-consuming to annotate the speech data manually, and it is not feasible to obtain a large amount of training data through manual annotation, and it is expensive and difficult to purchase a large amount of speech data with annotation from a third party.