Neural networks have been designed for a plurality of applications. For example, neural networks have been designed to extract features from data such as images, sound, video, text or time series, to recognize patterns of the data. Neural networks are modeled as collections of neurons that are connected in an acyclic graph. In other words, the outputs of some neurons can become inputs to other neurons. Neural network models are often organized into distinct layers of neurons. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first (input) layer, to the last (output) layer, possibly after traversing several hidden layers thereinbetween.
In deep neural networks (i.e., the neural networks with a plurality of hidden layers), each layer of neurons trains on a distinct set of features based on the previous layer's output. A neuron combines input (for example, a vector) from the data with a set of weights (for example, matrix), that either amplify or dampen that input, thereby assigning significance to inputs for the task the algorithm is trying to learn. These input-weight products are summed and the sum is passed through an activation function (e.g., Sigmoid, Tan h, ReLU, Leaky ReLU, Maxout, TLDR. etc.), to determine whether and to what extent that signal progresses further through the network to affect the ultimate outcome (e.g., an act of classification). Pairing adjustable weights with input features is how the significance is assigned to these features with regard to how the network classifies and clusters input. This feature hierarchy of increasing complexity and abstraction makes deep neural networks capable of handling very large, high-dimensional data sets with billions of parameters that pass through nonlinear functions to perform automatic feature extraction without human intervention. Deep neural networks may end in an output layer such as a logistic or softmax classifier that assigns a likelihood to a particular outcome or label. Given raw data in the form of an image, a deep neural network may predict/decide, for example, that the input data is likely to represent a person, a cat, a horse, etc. at a certain percentage.
For example, Convolutional Neutral Networks (CNN) are one type of deep neural networks, which have demonstrated its power in many image recognition tasks. Although one may increase the network size, including depth and width, to achieve higher accuracy of image recognition, this comes at the expense of much more latency for forward inference. For example, benchmarks for popular CNN models on ImageNet dataset show that the latency at test time has been increased from 7.0 milliseconds (ms) (AlexNet), to 109.32 ms (ResNet) in order to reduce the top-1 error from 42.90% to 22.16%. Therefore, how to achieve higher recognition and classification accuracy without sacrificing the efficiency of the deep neural network becomes an important issue to address.