US 12,169,770 B2
Neural network model compression with structured weight unification
Wei Jiang, San Jose, CA (US); Wei Wang, Palo Alto, CA (US); and Shan Liu, San Jose, CA (US)
Assigned to TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed by TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed on Nov. 3, 2020, as Appl. No. 17/088,061.
Claims priority of provisional application 62/964,996, filed on Jan. 23, 2020.
Prior Publication US 2021/0232891 A1, Jul. 29, 2021
Int. Cl. G06N 3/04 (2023.01); G06N 3/084 (2023.01); H03M 7/30 (2006.01)
CPC G06N 3/04 (2013.01) [G06N 3/084 (2013.01); H03M 7/70 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for compressing a neural network model, executable by a processor, comprising:
for a layer in a neural network model comprising a plurality of layers, reshaping weight coefficients corresponding to a block in a multi-dimensional tensor associated with a neural network, wherein the block is a part of a super-block in the in the multi-dimensional tensor;
for the layer, unifying a set of weight coefficients associated with the one or more reordered indices corresponding to the block using:

OG Complex Work Unit Math
wherein Wj is the set of weight coefficients of a j-th layer; LU(Wj) is a unification loss of the j-th layer; and N is a total number of layers in the neural network model; and
compressing a model of the neural network based on the unified set of weight coefficients.