A deep learning model is applied to various applications more and more extensively, for example, speech recognition, image recognition, semantic analysis and automatic driving. According to the model, processing such as mapping and operation is performed for linear layers and non-linear layers between nodes of different layers, and during the processing, training, modification and updating are performed for the model, thereby finally enhancing classification or prediction accuracy. During actual processing, the deep learning model occupies a larger storage space and requires a large amount of operation.
There are two classes of operations in the deep learning model: one is matrix multiplication, and the other is an element-wise operation such as an activation function. The two classes of operations constitute basic units of deep learning. The matrix multiplication portion is a key module for storage and operation. To reduce the storage space and amount of operation of the deep learning model, it is desirable to provide a matrix compressing method adapted for the deep learning model. Meanwhile, to ensure the precision of the deep learning model, the element-wise operation except the matrix multiplication retains a floating point processing manner.