Artificial intelligence (AI for short) is a new technical science studying and developing theories, methods, techniques and application systems for simulating, extending and expanding human intelligence. The artificial intelligence is a branch of computer science, which attempts to know the essence of intelligence and to produce an intelligent robot capable of acting as a human. The researches in this field include robots, speech recognition, image recognition, natural language processing and expert systems, etc. The most important aspect of the artificial intelligence is the speech recognition technology.
The g2p model is very important in English speech synthesis, which is used to convert graphemes in received words to phonemes. In the related art, the g2p model is trained by deep neural network technology, which may obtain better effect than that based on statistical language models.
The objective of the g2p model is to convert a word to a phoneme sequence, and the number of real words is relatively fixed, i.e., about one hundred thousand. However, the amount of data required for training the g2p model is relatively fixed. The process of training the g2p model with the deep neural network is entirely different from the process of the acoustic model training.
During training the acoustic model with the neural network, the number of layers of the neural network and the number of units in each layer can be increased constantly, since the training data can be added constantly. However, for the training of the g2p model, if the number of layers of the neural network and the number of units in each layer are increased but the amount of the training data is relatively fixed, it is easy to cause an over-fit phenomenon. If the over-fit phenomenon occurs, the g2p model obtained by training has a good performance on training data but has a relative poorer performance on test data than the performance on the training data.
However, if a smaller network is used to train the g2p model, the g2p model with a relatively acceptable performance can be obtained. However, for this kind of g2p model, the number of layers of the network and the number of units in each layer are relatively smaller, and the learning capability and the generalization capability of the g2p model are not as good as that obtained by using the deep neural network.