Artificial intelligence has been developed rapidly in recent years and has greatly affected people's lives. All countries in the world have paid high attention to artificial intelligence and have a large-scale R & D investment. Artificial neural network is the core of artificial intelligence application. Deep learning neural network algorithm is the most common artificial neural network model. Its workload is characterized in being compute-intensive (multiply-add operations at G magnitude) and data-intensive (Megabytes to hundreds of Megabytes in parameters). The computing platform based on the conventional general-purpose processor CPU cannot meet the performance requirements well. In recent years, the heterogeneous platforms for accelerating neural network computing represented by NVIDIA GPUs have become popular. The compilation toolchain and development kit packaged in CUDA SDK simplify user application development in heterogeneous CPU+GPU environment. As cost-effective acceleration solutions such as FPGAs and various deep learning ASICs (such as Google TPU) continue to emerge, it is imperative to address the following issues that CPU+FPGA/ASIC heterogeneous computing platforms face:
1. Programmability based on the popular C/C++ high-level language;
2. Reducing the neural network application development threshold, and improving the programming efficiency;
3. Optimizing the neural network structure, and compilation and generating efficient computing instructions;
4. Improving data reuse and reducing data movement between CPU and FPGA/ASIC.
Therefore, there is a need for a programming model of a neural network-oriented heterogeneous computing platform, which effectively solves various difficulties faced in the development, compilation, deployment and running stages of the neural network application under the heterogeneous environment of a CPU+neural network-specific processor.