CPC G06N 3/04 (2013.01) [G06N 3/084 (2013.01); H03M 7/70 (2013.01)] | 20 Claims |
1. A method for compressing a neural network model, executable by a processor, comprising:
for a layer in a neural network model comprising a plurality of layers, reshaping weight coefficients corresponding to a block in a multi-dimensional tensor associated with a neural network, wherein the block is a part of a super-block in the in the multi-dimensional tensor;
for the layer, unifying a set of weight coefficients associated with the one or more reordered indices corresponding to the block using:
![]() wherein Wj is the set of weight coefficients of a j-th layer; LU(Wj) is a unification loss of the j-th layer; and N is a total number of layers in the neural network model; and
compressing a model of the neural network based on the unified set of weight coefficients.
|