In machine learning, a convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural network that has successfully been applied to analyzing visual imagery.
A CNN-based object detector may (i) instruct one or more convolutional layers to apply convolution operations to an input image, to thereby generate a feature map corresponding to the input image, (ii) instruct an RPN (Region Proposal Network) to identify proposals corresponding to an object in the input image by using the feature map, (iii) instruct a pooling layer to apply at least one pooling operation to areas on the feature map corresponding to the identified proposals, to thereby generate one or more pooled feature maps, and (iv) instruct an FC (Fully Connected) layer to apply at least one fully connected operation to the acquired pooled feature maps to output class information and regression information for the object, to thereby detect the object on the input image.
However, since the CNN-based object detector uses the feature map whose size is reduced from a size of the input image by the convolutional layer, although large-sized objects in the input image can be easily detected, it is difficult to detect a small-sized object in the input image.
As another example, it is possible to detect the small-sized object by using a resized image obtained by enlarging the input image. In this case, however, the amount of computation by the object detector tremendously increases, thereby deteriorating the performance of the object detector.
Accordingly, the inventors of the present disclosure propose a learning method, a learning device for efficiently detecting objects with various sizes in the input image with less computational time, and a testing method and a testing device using the same are disclosed herein as well.