Deep Convolution Neural Networks, or Deep CNN is the most core of the remarkable development in the field of Deep Learning. Though the CNN has been employed to solve character recognition problems in 1990s, it is not until recently that the CNN has become widespread in Machine Learning. For example, in 2012, the CNN significantly outperformed its competitors in an annual software contest, the ImageNet Large Scale Visual Recognition Challenge, and won the contest. After that, the CNN has become a very useful tool in the field of the machine learning.
Meanwhile, ROI pooling is a pooling method by using each of representative feature values corresponding to each of sub-regions, wherein each of the sub-regions are included in at least one ROI, i.e., Region-Of-Interest, which is at least one important part of an input image, i.e. at least one part where at least one target object is located. The ROI pooling is widely used in the Machine Learning because it can reduce a computational load.
According to the conventional ROI pooling method, on condition that ROI corresponding to an object included in the input image is determined by an RPN, pooling operations are applied to respective sub-regions in the ROI. Through pooling operations, representative feature values corresponding to the sub-regions in the ROI are calculated. Herein, the representative feature values are generated by using each average of all feature values included in each sub-region or by using each largest value among all feature values included in each sub-region.
FIG. 4 shows a process of generating the representative feature values for the ROI pooling according to the conventional ROI pooling method.
By referring to FIG. 4, it may be seen that all feature values included in the sub-regions are used for generating the representative feature values.
However, the conventional ROI pooling method explained above has a critical disadvantage of inefficient memory access, because the sub-regions in the ROI are rectangles, though shapes of arbitrary objects included in arbitrary input images are various, generally not rectangles. Thus, there may be some unnecessary pixels which are included in the sub-regions but not included in the objects of the input image. These inefficient, needless accesses to the unnecessary pixels may cause distortion on the pooling result, or a drop of learning speed.