At present, central processing units and graphics processing units have been widely used in the field of artificial intelligence computing. Although the latter can provide stronger computing capability than the former, these two kinds of hardware are general-purpose processors based on a fine granularity instruction stream, and their architectures have following comonalities: one instruction in the fine granularity instruction stream only accomplishes the most basic computing operations such as addition, multiplication and memory access. For the arithmetic logic units with a fine granularity in the processor, one unit generally performs only one multiplication and addition operation. Memory access modes and general data paths supporting the fine granularity access have to ensure fine granularity memory access and general data paths.
In the prior art processors, the computing efficiency of the fine granularity instruction stream, the arithmetic logic units and the general memory access modes and data paths is not high for such specific artificial intelligence applications. On one hand, the fine granularity instruction stream and the arithmetic logic units need to frequently load and store data, and the efficiency is relatively low. On the other hand, for the large number of artificial intelligence applications, a general architecture will have a large number of redundant circuit logics, resulting in a complicated system design, more circuit resource consumption and higher total cost.