In recent years, deep learning using a multilayer neural network has drawn attention in the machine learning field. Deep learning has been expected to greatly contribute to the development in neural network-related technology, and is considered to have triggered the third artificial intelligence boom. Deep learning allows a neural network to semi-automatically obtain various feature expressions, and does not require the designer to devise a method for extracting a feature amount. In the future, a mobile device, such as a smartphone, an autonomous robot, or a drone, may be enabled to take intellectual and self-motivated actions by applying deep learning to the mobile device.
The multilayer neural network used in deep learning is typically implemented by making a computer server including a CPU (Central Processing Unit), a memory, and a GPU (Graphic Processing Unit) execute a program (code). The GPU included in the computer server is also called an accelerator, and can enhance the speed of the execution of the program by utilizing parallelism of the GPU. In addition, distributed learning using a plurality of GPUs is performed for further enhancement of the speed of the deep-learning process.
In the multilayer neural network, an input vector of a layer is multiplied by a learning weight matrix, and an input vector of the next layer is generated based on the product (which is a vector). Such an interlayer matrix product calculation is frequently performed, and its calculation cost is high.
For example, a deep learning neural network for image recognition, which has been successful in recent years, includes stacked multiple convolution layers and a fully connected layer near an output layer, and matrix product calculations account for most of the calculation cost in the convolution layer and fully connected layer. In a DNN (Deep Neural Network)-HMM (Hidden Markov Model) hybrid model, which is widely used for speech recognition, the DNN part includes multiple fully connected layers which use matrix product calculations.
On the whole, matrix product calculations account for most of the calculation cost in deep learning. Therefore, the aforementioned deep learning implemented by using a computer server enhances its speed by using a matrix product calculation library tuned for a GPU.
The GPU serving as an accelerator greatly contributes to enhancement of the speed of the deep learning, but installing it on a mobile device is unrealistic. For example, the size of the GPU installed on a computer server (>25 cm×10 cm×3 cm) is much larger than that of a common smartphone, and the weight thereof is not less than 1 kg. In addition, operating one GPU requires as much power as 200 W.