Deep neural networks, and in particular convolutional neural networks, are being used with increasing frequency for a number of tasks such as image classification, image clustering, and object recognition. In a forward propagation of a conventional convolutional neural network, a kernel is passed over one or more tensors to produce one or more feature maps. At a particular location of the kernel within a tensor, each of a number of input values in the tensor operated on by the kernel are multiplied by a corresponding weight value in the kernel and summed via addition and subtraction to produce a single value of a feature map. Accordingly, a conventional convolutional neural network requires multiplication, addition, and subtraction. Implementing a conventional convolutional neural network requires a large amount of computing power, and the technology has thus been unavailable for mobile and low-power devices such as those for the Internet of Things.
Recent work in the field has focused on reducing the necessary computing power for implementing convolutional neural networks. A first approach, referred to as a “binary neural network,” uses binary weight values in the kernel. By converting the weight values in the kernel to binary values, a forward propagation of the binary neural network can be computed using only addition and subtraction. Foregoing the need for multiplication during forward propagation may result in a 2× savings in computing power. Further, storing binary weight values instead of real weight values may produce a 32× savings in memory. Finally, using binary weight values results in minimal if any impact on the accuracy of the binary neural network.
An additional approach, referred to as an “XNOR neural network,” uses binary input values in the tensors and binary weight values in the kernel. By converting the input values in the tensors and the weight values in the kernel to binary values, a forward propagation of the XNOR neural network can be computed using only an exclusive nor (XNOR) operation and a bit count operation, where a bit count operation is simply a count of the number of high bits in a given stream of binary values. Using an XNOR operation and a bit count operation instead of multiplication, addition, and subtraction may result in a 58× savings in computing power. Further, storing binary input values instead of real input values and binary weight values instead of real weight values may produce a 32× savings in memory. While using binary input values and binary weight values does reduce the accuracy of the XNOR neural network, the results are often still acceptable for use.
XNOR neural networks in particular have opened the possibility of implementation on mobile and other low-power devices. However, conventional computing systems are not well suited for the efficient implementation of these XNOR neural networks. Accordingly, there is a need for computing systems, and in particular memory architectures, capable of efficiently supporting the operation of XNOR neural networks for improvements in speed and efficiency.