Multilayer neural networks (MNN) are widely applied to the fields such as pattern recognition, image processing, functional approximation and optimal computation. In recent years, due to the higher recognition accuracy and better parallelizability, multilayer artificial neural networks have received increasing attention by academic and industrial communities. More specifically, various operations for submatrices may be performed frequently in deep learning processes in MMNs.
A known method to perform various operations for submatrices in a multilayer artificial neural network is to use a general-purpose processor. However, one of the defects of the method is low performance of a single general-purpose processor which cannot meet performance requirements for usual multilayer neural network operations with respect to a submatrix with a large number of elements.
Another known method to perform operations for submatrices of the multilayer artificial neural network is to use a graphics processing unit (GPU). Such a method uses a general-purpose register file and a general-purpose stream processing unit to execute general purpose single-instruction-multiple-data (SIMD) instructions to support the algorithms in MNNs. However, since GPU only contains rather small on-chip caching, then data of the submatrix elements may be repeatedly moved from the off-chip, and off-chip bandwidth becomes a main performance bottleneck, causing huge power consumption.