In image processing, a pooling operation is usually performed, in order to reduce the data volume to be processed or stored. For example, after the features are obtained by convolution, the next step is to use these features for classification. However, this operation faces the challenge of large amount of computation. Therefore, in order to describe a larger image, aggregation statistics, that is, pooling, may be performed on features at different locations.
The pooling scheme common-used in neural network is a software-based scheme. The pooling computing involves three parts: input data, pooling operations, and output results. Specifically, the central processing unit (CPU) or the convolution hardware accelerator saves the two-dimensional image data structure to be pooled on the main memory; the two-dimensional array of the data structure is denoted as img[height][width], wherein height is the height of the two-dimensional array and width is the width of the two-dimensional array; the image is divided into 2 pixels×2 pixels as the basic unit, each basic unit is traversed, the pooling calculation is performed for each basic unit, and the pooling result is outputted.
However, in the prior art, it is required to occupy a large amount of CPU time to prepare the pooling data, read the pooling data and write the pooling result by the master CPU, so that the time allocated by the CPU to other tasks may be less, causing the completion time of other tasks to be delayed, and the overall performance of the system may decline. In addition, in the current mainstream processor architecture, if a Cache Miss or Cache Flush operation is encountered, then the CPU needs a large amount of waiting time when performing the pooling operation, thereby reducing the pooling efficiency.