Many of today's hardware accelerators for neural networks perform mainly matrix multiplication in a dense format, not taking into account the fact that there is a large percentage of zeros found in one (or both) matrixes. This introduces inefficient usage of hardware resources (multiplying with 0) and power wasting.
Today's hardware accelerators for neural networks, both for training and inference, all compete to achieve the best raw performance numbers and power-to-performance ratio values. Exploiting the native and injected sparsity in those neural networks is one way to get a lead in this competition.
Machine learning architectures, such as deep neural networks, have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics and drug design.
Matrix multiplication is a key performance/power limiter for many algorithms, including machine learning. Some conventional matrix multiplication approaches are specialized, for example they lack the flexibility to support a variety of data formats (signed and unsigned 8b/16b integer, 16b floating-point) with wide accumulators, and the flexibility to support both dense and sparse matrices.
The problem being addressed herein is to increase the performance and power-efficiency of neural network processing chips by more efficiently processing matrix multiplies in the presence of sparsity in the input data set (sparse matrices have a density of less than 1.0, meaning that less than 100% of their elements have non-zero values). In particular, this problem is addressed, while simultaneously maintaining the performance for dense (non-sparse) matrix multiplication.