A multi-layer artificial neural network is widely used in fields like pattern recognition, image processing, function approximation and optimized computation etc. Particularly as studies on reverse propagation training algorithm and pre-training algorithm go deeper and deeper in recent years, the multi-layer artificial neural network attracts more and more attention from the academic and industrial fields due to its higher recognition accuracy and better parallelizability.
With the surge in computing and accessing amount in artificial neural network, prior arts generally utilize general processor to process multi-layer artificial neural network operation, training algorithm and its compression coding and the above algorithms are supported by utilizing general register file and general functional component to execute general instructions. One of the disadvantages for using general processor is that the low computing performance of a signal general processor cannot meet the needs for the performance of multi-layer artificial neural network operation. Meanwhile, if multiple general processors are working concurrently, the intercommunication between general processors will limit its performance. In addition, general processor needs to transcode a multi-layer artificial neural network operation into a long sequence of operation and access instruction, and this front-end transcoding of processor causes relatively high power consumption. Another known method for supporting multi-layer artificial neural network operation, training algorithm and its compression coding is to use graphics processing unit (GPU). This method supports the above algorithms by using general register file and general stream process unit to execute general SIMD instructions. Since GPU is specifically used for executing graphics and image computing and scientific calculation, it does not provide specific support to multi-layer artificial neural network operation, and thus a lot of front-end transcoding is still required to perform multi-layer artificial neural network operation, and as a result large additional costs are incurred. Besides, GPU only has relatively small on-chip cache, model data (weight) of multi-layer artificial neural network needs to be carried repeatedly from outside of the chip, and off-chip bandwidth becomes the main bottleneck for its performance and causes huge power consumption as well.