Artificial neural network (ANN), in particular, convolutional neural network (CNN) has achieved great success in various fields. For example, in the field of computer vision (CV), CNN is widely used and most promising.
Image classification is a basic problem in computer vision (CV). In recent years, Convolutional Neural Network (CNN) has led to great advances in image classification accuracy. In Image-Net Large-Scale Vision Recognition Challenge (ILSVRC) 2012, Krizhevsky et al. showed that CNN had great power by achieving the top-5 accuracy of 84.7% in classification task, which was significantly higher than other traditional image classification methods. In the following years, the accuracy has been improved to 88.8%, 93.3%, and 96.4% in ILSVRC 2013, 2014, and 2015.
While achieving state-of-the-art performance, CNN-based methods demand much more computations and memory resources compared with traditional methods. In this manner, most CNN-based methods have to depend on large servers. However, there has been a non-negligible market for embedded systems which demands capabilities of high-accuracy and real-time object recognition, such as auto-piloted car and robots. But for embedded systems, the limited battery and resources are serious problems.
To address this problem, many researchers have proposed various CNN acceleration techniques from either computing or memory access aspects. For example, C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing fpga-based accelerator design for deep convolutional neural networks”; T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning”; Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, “Dadiannao: A machine-learning supercomputer”; D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen, “Pudiannao: A polyvalent machine learning accelerator”; Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, “Shidiannao: shifting vision processing closer to the sensor”; S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, “A dynamically configurable coprocessor for convolutional neural networks”; C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, “Neuflow: A runtime reconfigurable dataflow processor for vision”, C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun, “Cnp: An fpga-based processor for convolutional networks”.
However, most of previous techniques only considered small CNN models such as the 5-layer LeNet for simple tasks such as MNIST handwritten digits recognition.
State-of-the-art CNN models for large-scale image classification have extremely high complexity, and thus can only be stored in external memory. In this manner, memory bandwidth becomes a serious problem for accelerating CNNs especially for embedded systems. Besides, previous research focused on accelerating Convolutional (CONV) layers, while the Fully-Connected (FC) layers were not well studied.
Consequently, it is desired to go deeper with the embedded FPGA platform to address these problems.