In recent years, multilayer neural networks (also called DNN (Deep Neural Networks)) are used in data classification. For example, particularly, a CNN (Convolutional Neural Network) which is one kind of a DNN is often used in classification of one-dimensional time-series data and two-dimensional image data. FIG. 1 illustrates an example of a CNN. In FIG. 1, the round graphic shapes represent nodes, and the line segments that connect between nodes represent edges. As illustrated in FIG. 1, in a CNN, convolutional layers and pooling layers are provided between an input layer and an output layer. In convolutional layers and pooling layers, features are extracted from data, and classification is performed based on the extracted features.
There are not only CNNs with a single input but also CNNs with plural inputs. For example, there are cases in which classification of meteorological conditions is performed according to data obtained from cameras located at plural observation points, cases in which behavior estimation is performed according to data that is obtained from wearable sensors attached to both hands and both feet, and the like. FIG. 2 illustrates an example of a CNN that processes plural inputs. In the CNN that is illustrated in FIG. 2, nodes that correspond to each input are in the input layer, and it is possible to process plural inputs.
However, in this CNN, features are not extracted from each of plural inputs, but a feature is extracted from the combination of plural inputs. Generally, each image and each time series have its own independent meaning, and it is often preferable to extract a feature from each image and each time series. Moreover, there is a case in which it is not possible to simply join data from plural inputs, and a case in which it is not possible to join data in and after the second layer because network structures to be applied differ. The former case is a case where a CNN cannot be applied because joined data does not become rectangular data because, for example, data sizes of plural images are different. The latter case is a case where both image data and time-series data are processed, and a case where both image data and language data are processed.
On the other hand, in a parallel CNN such as illustrated in FIG. 3, it is possible to extract a feature from each input. In FIG. 3, a channel is provided for each input, and a network structure of a channel is suitable to that input. A feature is extracted from an input of each channel, and the features are combined in the last stage.
However, strength of an effect that an input has on an output may differ depending on a type of an image and a time series, and there may be no effect at all on a certain output depending on a type of an image and a time series. When performing learning using a typical backpropagation, an effect that errors have is uniformly distributed among channels, and it is not possible to perform learning that takes strength of an effect of each input into consideration. Moreover, even when there are an output that an input affects and an output that an input does not affect, it is not possible to perform learning that takes that into consideration. Proper learning is not performed due to the reason described above, and there is a case where precision of classification is not improved as a result.    Patent Document 1: U.S. Patent Publication No. 2014/0180989    Non-Patent Document 1: Natalia Neverova, Christian Wolf, Graham Taylor, and Florian Nebout, “ModDrop: adaptive multi-modal gesture recognition”, [online], Jun. 6, 2015, Cornell University Library, [retrieved on Jan. 12, 2016], Internet    Non-Patent Document 2: Yi Zheng, Qi Liu, Enhong Chen, Yong Ge, and J. Leon Zhao, “Time Series Classification Using Multi-Channels Deep Convolutional Neural Networks”, WAIM2014, Lecture Notes in Computer Science 8485, pp. 298-310, 2014
In other words, there is no technique to improve precision of classification by a parallel neural network.