1. Field
The present disclosure relates to the field of character recognition, and in particular to a training method and a training apparatus for a neural network for image recognition.
2. Description of the Related Art
Presently, in the field of handwritten character recognition, a method based on convolutional neural network (CNN) performs better as compared with a conventional recognition method. For a conventional neural network, a structure of which is shown in FIG. 1, a recognition process is as follow. By taking a handwritten numeric character 6 as an example, an image (sample) is input, and after multiple repetitions of convolution, maximum spatial sampling and full connection operations, CNN outputs a confidence coefficient on each numeric character, with the one with the highest confidence coefficient being the recognition result. In a conventional CNN model, each operation is denoted by one layer. For example, the convolution operation corresponds to a convolution layer, the maximum spatial sampling operation corresponds to a pooling layer, and the full connection operation corresponds to a full connection layer. The convolution layer and the pooling layer each output several two-dimension matrixes, referred to as a feature map. In FIG. 1, each block represents one feature map.
In recent years, it is indicated by many disclosed experimental evidences that, the more the number of the layers of the CNN is and the more the number of the neurons in each layer is, the better the performance of the CNN is. However, the bigger the model of the CNN is, the more difficult the training is. Major problems are as follows.                a) The bigger the model is and the more the parameters are, the more the samples required for training are.        b) The bigger the model is, the more it tends to over fit.        
For the above two problems, in a case of a given training set, a conventional solution is as follow:                a) samples in the training set are randomly perturbed to generate more training samples; and        b) the model is randomly perturbed during the training to enhance the generalization performance of the model, which method is referred to as a regularization method.        
It is desirable to provide a more effective method and apparatus for solving the above two problems collectively in one framework.