Machine learning algorithms, such as deep neural networks, are increasingly being used for many artificial intelligence applications, such as computer vision, speech recognition, and robotics. Implementing machine learning algorithms typically requires high computational complexity. Indeed, running machine learning algorithms on a general-purpose central processing unit (CPU) can be extremely expensive, and in some cases quite impractical. Accordingly, techniques that enable efficient processing of machine learning algorithms to improve energy-efficiency and throughput are highly desirable.
Hardware acceleration components, such as field programmable gate arrays, have been used to supplement the processing performance of general-purpose CPUs for implementing machine learning algorithms.