Technical Field
The present invention relates to a learning apparatus, a learning program, and a learning method for performing a learning process of a feed-forward multilayer neural network with supervised learning.
Related Art
A multilayer neural network obtained by a learning process with supervised learning is usable as an excellent calculation model that has a high discriminative capability. A theory has not yet been established that explains, when deciding structure of a multilayer neural network during a learning process with supervised learning, which type of structure of a multilayer neural network provides high generalization capability (discriminative capability regarding data not used for a learning process) regarding a certain piece of training data that has been provided. Thus, a heuristic method has been used as a method for obtaining structure of a multilayer neural network that has high generalization capability.
For example, in X. Liang, “Removal of Hidden Neurons by Crosswise Propagation”, Neural Information Processing—Letters and Reviews, Vol. 6, No. 3, 2005, a method for constructing an optimal network structure is proposed. In this method, units of each hidden layer in the multilayer neural network are removed one at a time. In this method, in a state in which the multilayer neural network at an initial setting (referred to, hereinafter, as simply an “initial multilayer neural network” or “initial network”) has been sufficiently trained, units are eliminated in the following manner.
That is, a correlation between the outputs of differing units in the same layer is calculated for the training data. A single unit having the highest correlation is then removed. After the unit is removed, learning of weights other than that of the removed unit is restarted. Relearning and unit removal are repeatedly performed until a cost function (also called, e.g., an objective function or an error function) defined in the network begins to increase. The structure of the initial network is provided manually.
In addition, JP-B-3757722 describes a method for optimizing the number of units in an intermediate layer (hidden layer) of a multilayer neural network in supervised learning.
The above-described conventional methods for deciding the multilayer neural network structure share a commonality. That is, the initial network is trained first. The number of units are increased and decreased based on an index that indicates that an improvement in generalization capability can be expected. The number of units in the hidden layers is thereby automatically decided. In other words, in the conventional methods, the number of units is optimized while the number of layers is fixed. Therefore, the number of layers per se is not optimized.
The multilayer neural network is generally considered to have favorable discriminative capability. However, discrimination becomes more time-consuming as the number of layers increase. Therefore, the number of layers is a parameter that significantly affects discriminative capability and calculation amount. However, as described above, a method for optimally deciding the number of layers has not been proposed.
In addition, regarding convolutional neural networks (CNN) as well, the number of filter layers and the number of fully connected layers that follow the filter layers are currently decided manually by a designer. A method for deciding the optimal numbers of filter layers and fully connected layers has not been proposed.