The present invention, generally, relates to machine learning and, more particularly, to learning of a neural network.
Convolutional Neural Networks (CNNs), which are Artificial Neural Networks (ANNs) with many layers including at least a convolutional layer, have been widely used for various recognition processing systems such as Automatic Speech Recognition (ASR) systems, image recognition systems, etc. It has been shown that the CNNs can achieve superior accuracy as an acoustic model for the ASR. Since local windows spanning time and frequency axes are shared in the CNNs, the CNNs can capture translation invariance with far fewer parameters than normal Deep Neural Networks (DNNs) without any convolutional layer.
Typically, a neural network such as a convolutional layer followed by a DNN are first subjected to pre-training and then fine-tuning with appropriate criterion such as cross entropy criterion. Generally, weights in the neural network and, more particularly, weights in the convolutional layers, are initialized with random values before pre-training.