With the exponential growth of neural network based deep learning applications across various business units, the commodity Central Processing Unit/Graphics Processing Unit (CPU/GPU) based platform is no longer a suitable computing substrate to support the ever-growing computation demands in terms of performance, power efficiency and economic scalability. Developing neural network processors to accelerate neural-network-based deep-learning applications has gained significant traction across many business segments, including established chip makers, start-up companies as well as large Internet companies. Single Instruction Multiple Data (SIMD) architecture can be applied to chips to accelerate calculations for applications of deep learning.
Neural network algorithms generally require large matrix multiplication accumulation operations. Accordingly, accelerating hardware generally requires large-scale parallel multiply-accumulation structures to speed up the acceleration. However, area and power cost needs of such structures must be controlled to optimize computational speed of the hardware and reduce size of the number of chips to economize on power consumption.