1. Field
The disclosure herein relates to deep neural networks, and in particular, to techniques for training a deep neural network.
2. Description of the Related Art
Deep neural networks, especially, deep convolution neural networks, have gained more and more interests not only from the research community but also from industry, much due to its great success in applications include image classification, object detection, video classification and speech recognition. In these applications and others, significant performance boost over traditional computational methods is observed. In some instances, performance of a deep neural network can exceed that of a human. In a deep convolution neural network (referred to as a CNN), the core is the deeply stacked convolution layers, which can be interlaced by normalization layers and subsampling layers (or pooling layer), where each convolution layer is includes a set of filters (filter bank). The filters can be two-dimensional (2D) filters, such as for image classification; three-dimensional (3D) filters, such as for video classification; or linear (one-dimensional 1D) such as filters for speech recognition.
Commonly, to learn a given neural network, backpropagation is applied. In backpropagation, parameters of the network are learned from training data. In embodiments where the neural network performs image recognition, training data may include a plurality of images. Often during training, the learning process may get stuck in local minimum and is prone to over-fitting due to the huge number of parameters and nonlinearity of the neural network. As a result, dead filters and/or duplicated filters are realized. This leads to inefficient processing when implementing the neural network.
Several techniques have been tried to provide for more effective training. Some of the techniques include using more training data, adding more variation to training data (e.g., by adding some random noise), using more complex networks (e.g., more layers and more filters). Those efforts suffer from several limitations. For example, more training data means much more effort in collecting the training data. Adding more variation creates instability of the neural network due to the randomness. Using a more complex neural network requires much more computational capabilities and increases the potential for over-fitting. One phenomenon of over-fitting is the incorporation of ineffective feature detectors. That is, for example, dead or duplicated feature detectors which do not contribute to the performance of neural network but waste a large number of parameters and therefore slow down processing and impose a general requirement for additional resources.
Thus, what are needed are improved techniques for more effectively learning a deep neural network. The techniques should provide for reducing the existence of dead filters or duplicated filters and lead to more efficient processing.