Technology for comparing an image with another image or comparing an image with a video and determining whether identicalness is present between them or whether an inclusion relation is present between them has been proposed in various forms in the field of computer vision, such as image matching and object tracking. Such technology chiefly uses a method of extracting feature points from an image or frame-based images constituting a video, causing the extracted feature points to correspond to each other, and comparing the feature points with each other, and aims to present exact comparison results more quickly by utilizing a feature point extraction scheme and a specific algorithm upon comparing the corresponding features points. As is well known in the art, feature points (or interest points) are points capable of representing the features of an image, and denote points capable of desirably describing the features of an image or a set of points, regardless of variations in the scale, rotation, or distortion of an image. As feature points, several thousands or several tens of thousands of feature points per picture, for example, may be extracted although they differ depending on the size and content of a given image and the type of feature point extraction/determination method. Such feature points are widely used in the field of image processing or computer vision, and are used in various tasks, such as object recognition, motion tracking, and determination of identicalness between images by, for example, extracting feature points and searching two images for corresponding parts using the feature data of the extracted feature points. However, in accordance with such a conventional feature point extraction/determination method, there are many cases where an excessively large number of feature points are acquired from a given image, so that limitations are reached in that the amount of data to be processed in a post-processing procedure for performing image comparison, object tracking, etc. using the feature points becomes excessive, and then operation time is greatly lengthened. For example, as methods of extracting feature points from an image and forming feature data of the extracted feature points, there are various proposed methods, such as a Scale-Invariant Feature Transform (SIFT) algorithm disclosed in U.S. Pat. No. 6,711,293 (by David G. Lowe) and a Speed Up Robust Features (SURF) algorithm (by H. Bay, T. Tuytelaars and L. van Gool (2006), “SURF: Speeded Up Robust Features”, Proceedings of the 9th European Conference on Computer Vision, Springer LNCS volume 3951, part 1. pp. 404˜417). However, since such conventional technology requires approximately several thousands of several tens-dimensional feature vectors per image, there is a problem in that the operation process is complicated, and the amount of data to be processed is large, so that an excessively long computation time is required, thus causing many problems when a large amount of data must be processed. Therefore, the development of technology capable of providing exact results while reducing operation time and the amount of data to be processed by using a smaller number of feature points is required.
In particular, recently, with the improvement of the transfer rate of networks, together with the development of the Internet, mobile technology, and environment, the consumption of multimedia data such as videos or images has been remarkably increased. For example, websites on which video data such as for dramas or movies can be watched have been widely used, and the number of video community sites on which various services allowing users to personally upload, search, and share various types of video data can be implemented has also rapidly increased. Further, multimedia services such as images or videos have been provided through various channels, such as Internet portal sites, User Generated Contents (UGC) sites, blogs, cafes, and web-hard sites. Furthermore, recently, with the development of the mobile environment, such as in the popularization of smart phones and the increase in wireless Local Area Network (LAN) environments, the rate of consumption of multimedia data even in the mobile environment has a tendency to exponentially increase. In this way, as images or videos are not only used in a specific field, but also widely used in web environment, there is a requirement for the development of technology which can more promptly and exactly determine relations between an image and another image or between an image and a video, and then use such relations for various types of additional services related to images or videos.