Object detection is one of the most fundamental problems in computer vision. The goal of an object detection is to detect and localize the instances of predefined object classes in the form of bounding boxes, e.g., with confidence values for given input images. An object detection problem can be converted to an object classification problem by a scanning window technique. However, the scanning window technique is inefficient because classification steps are performed for all potential image regions of various locations, scales, and aspect ratios.
The region-based convolution neural network (R-CNN) is used to perform a two-stage approach, in which a set of object proposals is generated as regions of interest (ROI) using a proposal generator and the existence of an object and the classes in the ROI are determined using a deep neural network. However, the detection accuracy of the R-CNN is insufficient for some case. Accordingly, another approach is required to further improve the object detection performance.