The present teaching relate to artificial neural networks, and more particularly to artificial neural networks used for object recognition and verification. In recent years, improvements to such artificial neural networks have largely been due to network expansions and an increase in training data. However, complex artificial neural network architectures often contain tens or hundreds of millions of parameters. While such artificial neural networks produce good accuracy, the large amount of parameters can make deployment infeasible, especially on embedded systems that often have limited computing power. As a result of the increasing size of artificial neural networks, there is an increased interest for compressing artificial neural networks in order to maintain the improvements, while at the same time making them feasible to implement in systems having limited computing power.
Since artificial neural networks are typically very large, as mentioned above, they can often be “over-parameterized.” This makes it possible to remove parameters, such as weights and biases, or entire neurons, without significantly impacting the performance of the artificial neural network. This procedure is typically referred to as “pruning” the artificial neural network. When a neuron is removed, it is possible to back-trace the calculations for that neuron. It can then be seen that all weights leading to that neuron can be removed safely. It is also possible to track neuron output and remove weights going from that neuron. However, identifying which neurons to remove in the pruning process and implementing the pruning process in such a way that performance can be gained is not trivial.
Pruning can be applied to layers containing trainable parameters, traditionally fully connected layers and convolutional layers. This helps to simplify and speed up the calculations. For example, removing a neuron from a fully connected layer is equivalent to skipping a dot product between a matrix row and a vector. As a result, the matrix becomes smaller. Removing a neuron from a convolutional layer means skipping the dot product between one matrix row and one matrix column, which is the same as skipping one convolution. The removal of neurons will be discussed in further detail in the Detailed Specification below. Determining which neurons can be removed without heavily affecting the accuracy of the artificial neural network can be done by analyzing the neurons during the training/test phase, and from the resulting data identifying which neurons are “dead,” that is, which neurons seldom or never produce non-zero output. Deciding how many times a neuron must produce non-zero output to not be defined as dead can be done by comparing the performance using different thresholds, and after the neurons have been removed a re-training can be done in order to improve the performance. This can be done iteratively.
Another approach for pruning focuses on removing weights, either using a threshold or using regularization with norms to force some weights to zero already during the training step. Regularization is a mathematical/statistical method, well known to those having ordinary skill in the art, that is used to enforce conditions, for example sparsity (forcing some values to zero) or smoothness. For further details on regularization for pruning, see the paper: “Memory bounded deep convolutional networks” arXiv CoRR 2014—Section 3: Regularization Updates available online at https://arxiv.org/abs/1412.1442.
By sufficiently pruning an artificial neural network in these manners, with an implementation that can avoid calculations for the removed parameters, the computations for executing the artificial neural network are lower than for a full network.