Technical Field
The present invention relates to image processing and, more particularly, to end-to-end fully convolutional feature learning for geometric and semantic correspondences, but are not limited to those.
Description of the Related Art
In a visual correspondence problem, one is given a set of images that contain an overlapping 3D region and asked to find the location of the projection of 3D points in all images. This problem arises in some computer vision applications including stereo disparity, structure from motion, panorama stitching, image representation, image retrieval, as well as more complicated tasks such as classification and detection.
To solve the visual correspondence problem, many hand-designed features have been proposed. Recently, with the advent of a powerful convolutional neural network (CNN), many researchers returned to the problem with this new tool. Rather than learning features, CNN can do end-to-end classification of patch similarity.
Once the CNN is trained, intermediate convolution layer features are used as a low dimensional feature. However, intermediate convolution features are not optimized for the visual correspondence task. The features are trained for a surrogate objective function (patch similarity), and intermediate features do not necessarily form a metric space conducive to performing visual correspondence. In addition, the patch similarity is inherently inefficient and slow. Since it is a patch-based method, features have to be extracted again even for the overlapping regions. Also, it requires O(n2) feed-forward passes to compare each of n patches with n other patches in a different image. Still, the patch-based similarity has been a preferred method for several reasons. First, since all the benchmarks only require image patch similarity, optimizing the system for patch similarity (classification) would yield better results than learning a metric space (metric learning). Second, since the neural network is good at abstracting fine details, CNN is an appropriate tool for measuring global similarity.