DNNs have attracted attention as a method of machine learning. DNNs have been applied to many applications such as image recognition and speech recognition, and their superior performances over conventional approaches have been reported in References below, where an error rate, for example, is improved by about 20 to 30%.
Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends in Machine Learning, Vol. 2, No. 1, pp. 1-127, 2009.
G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups,” IEEE Signal Processing Magazine, Vol. 29, No. 6, pp. 82-97, 2012.
A. Mohamed, G. Dahl, and G. Hinton, “Acoustic Modeling using Deep Belief Networks,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, No. 1, pp. 14-22, 2012.
We can consider a DNN to be a neural network having layers larger in number than before. Specifically, referring to FIG. 1, a DNN 30 includes an input layer 40, an output layer 44, and a plurality of hidden layers 42 provided between input layer 40 and output layer 44. Input layer 40 has a plurality of input nodes (neurons). Output layer 44 has neurons the number of which corresponds to the number of objects to be identified. Hidden layers 42 include a plurality of hidden layers 42 (7 layers, 9 layers, 11 layers, etc.). Each hidden layer has a plurality of neurons.
In DNN 30, the number of layers is large and the number of neurons in each layer is also large. Therefore, the amount of computation for learning could be enormous. Previously, such computation has been almost impossible. Nowadays, computers have higher computing capabilities, and distributed/parallel processing techniques and computational theory are so developed as to allow DNN learning. When a huge amount of data is to be used for training, however, it still takes a long time for learning. By way of example, an experimental DNN learning using 10 million images of 200×200 pixels as training data took three days by 1,000 machines of 16 cores. (Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean Andrew Y. Ng, “Building High-level Features Using Large Scale Unsupervised Learning,” Proc. ICML, 2012.)