Deep learning has led to state-of-the-art improvements in the accuracy of many artificial intelligence tasks, such as large-category image classification and recognition, speech recognition and nature language processing. The architecture can involve complex and many-layered neural networks (e.g., deep neural networks (DNN)) that can require intense computation for training and/or evaluation.
One approach uses a field programmable gate array (FPGA), which suffers from requiring developers to work with a hardware-centric register transfer level (RTL) flow. Although some FPGA manufacturers have provided high level synthesis tools that facilitate developers' programming of FPGAs using software-centric programming languages, such as C/C++, Matlab®, and OpenCL®, considerable programming effort remains and the performance of the provided synthesis tools is typically considered not as good as the hardware-centric RTL implementation.
Thus, a need exists for improvements in converting of a DNN model to an FPGA RTL-level implementation.