In many disclosure scenarios, pictures need to be classified to facilitate classified management of pictures. When there are few pictures, the pictures may be classified manually. However, as network science and technology develops, tens of thousands of pictures usually need to be classified in a network scenario, and the manual processing manner becomes excessively impractical. Hence, how to smartly recognize pictures for classification becomes particularly important in the network scenario.
In the prior art, a convolutional neural network model may be used to recognize class of the pictures. A current convolutional neural network model comprises a convolutional operation and a pooling operation, wherein the pooling operation comprises average pooling, maximum pooling and bilinear pooling and the like. The average pooling operation means averaging a group of input feature vectors and then outputting the average. The maximum pooling means taking a maximum value from a group, of feature vectors and then outputting it. The bilinear pooling means enabling input feature vectors to perform a vector outer product for themselves to obtain a bilinear representation of original features and outputting it. The features obtained by the bilinear pooling exhibit a stronger representation performance and achieve an effect better than the average pooling and maximum pooling.
However, the three types of pooling operations in the current convolutional neural network model cannot enrich granularity of picture recognition. Therefore, using the convolutional neural network model in the prior art to recognize pictures causes a larger granularity and an undesirable accuracy of picture recognition.