The convolution neural network is widely used in various applications, especially in image and video applications. The convolution layer is an essential computation part in the convolution neural network. In the convolution layer, taking image as an example, plurality of filters act on the image respectively to calculate the convolution. In the related art, the convolution calculation is implemented in two modes as follows: (1) the filter act on the image directly to calculate the convolution, in which the graphic processor thread group is in a two-dimension (X and Y) mode, X dimension is divided by a number of all the images and Y dimension is divided by a number of all the filters, and each graphic processor thread calculates convolutions of the plurality of filters on a plurality of images, but only calculates the convolution kernel corresponding to one data point; (2) all image data is unfolded one data point by one data point according to the size of filter, such that the convolution calculation is transferred to a dense matrix multiplication.
However, there are following defects in the related art. In the first mode, input data points corresponding to adjacent output data points overlap with each other. For example, for a convolution kernel with a step length of 1 and a 5*5 filter, overlapped data between the input data points corresponding to adjacent output data points have a proportion of eighty percent, such that a large number of data is read into the local memory repeatedly, thus resulting in a poor performance. In the second mode, in order to store the image, it is unfolded first, and thus the needed memory space is in direct proportion to the size of convolution kernel. For example, a 5*5 filter needs 25 times additional memory, and a 9*9 filter needs 81 times additional memory. In a practical application, the filer may have a larger size and a global memory cost of the graphic processor is increased greatly.