Matrix multiplication is used in many fields of engineering, numerical analysis, science, and machine learning. Computational complexity is on the order of O(n3), making hardware acceleration highly desirable.
Many matrix multiplication problems involve very large matrices that are sparse, and sparse matrices present various challenges for hardware acceleration. A sparse matrix is a matrix in which most of the matrix data elements have the value 0. Whereas acceleration of matrix multiplication involving dense matrices is limited by the computational speed of the digital signal processors (DSPs) employed, such as for multiply-and-accumulate functions, the acceleration of matrix-vector multiplication involving sparse matrices can be limited by vector loading, such as when sparse matrices result from pruning in neural networks, loading of the sparse matrix, lookup of vector column values in wide matrices, and scheduling of both row and column multiply-and-accumulate operations.
A number of different approaches have been employed for sparse matrix-vector multiplication. Though a dense matrix multiplier can be used, the dense matrix multiplier may be suitable only as a temporary solution or in cases in which the matrix is nearly dense. A size-limited custom multiplier may be useful if both weight matrices and a small data set can fit into a cache memory in the accelerator. However, the custom multiplier will be limited in size and may have a minimum density requirement (e.g., >5% non-zero values) or distribution requirement (e.g., similar number of non-zero values in each row). Hybrid solutions can involve caching, static scheduling and interleaving operations, and/or vector replications.