Convolutional artificial neural networks have been widely applied in pattern recognition and image processing for its high efficiency. One known type of devices for convolutional artificial neural networks is to implement a general-purpose processor, which includes a general-purpose register file and a general-purpose functional unit to execute general purpose instructions to support algorithms for convolutional artificial neural networks. However, one of the defects of the method is lower operational performance of a single general-purpose processor which cannot meet performance requirements for usual multilayer neural network operations. When multiple general-purpose processors execute concurrently, the intercommunication among them also becomes a performance bottleneck.
Another known type of devices may involve a graphics processing unit (GPU), which includes a general-purpose register file and a general-purpose stream processing unit to execute general purpose single-instruction-multiple-data (SIMD) instructions to support the algorithms. However, since GPU only contains rather small on-chip caching, then model data (weight values) of a multilayer artificial neural network may be repeatedly moved from the off-chip, and off-chip bandwidth becomes a main performance bottleneck, causing huge power consumption.