The present disclosure, generally, relates to machine learning techniques, and more particularly, to technique for training a neural network including an input layer, one or more hidden layers and an output layer.
It has been known that pre-training neural networks before fine-tuning can improve automatic speech recognition (ASR) performance, especially when the amount of training data is relatively small. There are several known pre-training techniques including discriminative pre-training that is based on an error back-propagation and generative pre-training that does not use discriminative information.
In conventional pre-training processes, a new layer initialized with random parameters is inserted to the top of the hidden layers just below the output layer. Then, the neural network is pre-trained using the training data.