Driven by the availability of massive data and the computational capability to process it, deep learning has recently emerged as a critical tool for solving complex problems across a wide range of domains, including image recognition, speech processing, natural language processing, language translation, video analytics, and autonomous vehicles. Convolutional neural networks (CNNs) have become the most popular algorithmic approach for deep learning for many of these domains. High performance and extreme energy efficiency are critical for deployments of CNNs in a wide range of situations, especially mobile platforms such as autonomous vehicles, cameras, and electronic personal assistants.
Employing CNNs can be decomposed into two tasks: (1) training—in which the parameters of a neural network are learned by observing massive numbers of training examples, and (2) classifying—in which a trained neural network is deployed in the field and classifies the observed data. Today, training is often done on graphics processing units (GPUs) or farms of GPUs, while inference depends on the application and can employ central processing units (CPUs), GPUs, field-programmable gate arrays (FPGAs), or application-specific integrated circuits (ASICs).
During the training process, a deep learning expert will typically architect the network, establishing the number of layers, the operation performed by each layer, and the connectivity between layers. Many layers have parameters, typically filter weights, which determine exact computation performed by the layer. The objective of the training process is to learn the filter weights, usually via a stochastic gradient descent-based excursion through the space of weights. The training process typically employs a forward-propagation calculation for each training example, a measurement of the error between the computed and desired output, and then back-propagation through the network to update the weights. Inference has similarities, but only includes the forward-propagation calculation. Nonetheless, the computation requirements for inference can be prohibitively large, particularly with the emergence of deeper networks (hundreds of layers) and larger inputs sets, such as high-definition video. Furthermore, the energy efficiency of this computation is important, especially for mobile platforms, such as autonomous vehicles, cameras, and electronic personal assistants. The computation requirements and energy consumption of a neural network for machine learning presents challenges for mobile platforms. Thus, there is a need for addressing these issues and/or other issues associated with the prior art.