Multilayer neural networks (MNN) are widely applied to the fields such as pattern recognition, image processing, functional approximation and optimal computation. In recent years, due to the higher recognition accuracy and better parallelizability, multilayer artificial neural networks have received increasing attention by academic and industrial communities.
A known method to support the forward propagation of a multilayer artificial neural network is to use a general-purpose processor. Such a method uses a general-purpose register file and a general-purpose functional unit to execute general-purpose instructions to support the aforementioned algorithm. However, one of the defects of the method is low operational performance of a single general-purpose processor which cannot meet performance requirements for usual multilayer neural network operations. When multiple general-purpose processors execute concurrently, the intercommunication among them also becomes a performance bottleneck. In addition, a general-purpose processor needs to decode the reverse computation of a multilayer artificial neural network into a long queue of computations and access instruction sequences, and a front-end decoding on the processor brings about higher power consumption.
Another known method to support the forward propagation of the multilayer artificial neural network is to use a graphics processing unit (GPU). Such a method uses a general-purpose register file and a general-purpose stream processing unit to execute general purpose single-instruction-multiple-data (SIMD) instructions to support the aforementioned algorithm. Since GPU is an apparatus specially for executing graph and image operation as well as scientific computation and fails to specially support multilayer artificial neural network operations, the GPU remains in need of a great amount of front-end decoding to execute multilayer artificial neural network operations, thus producing plenty of additional overheads. Besides, since GPU only contains rather small on-chip caching, then model data (weight values) of a multilayer artificial neural network has to be repeatedly moved from the off-chip, and off-chip bandwidth becomes a main performance bottleneck, causing huge power consumption.