The embodiments relate to a high-speed searching method for large-scale image databases. More specifically, the embodiments relate to a technique for searching large-scale image databases for search objects at high speeds.
As smartphones and tablets have increased in popularity, there is growing demand for activities such as internet shopping in which information related to an object is instantly retrieved by simply holding the device with a camera in front of the object. For example, when a user comes across an interesting book displayed on a poster or subway advertisement, the user can hold the device up to the advertisement, and the device can search for information on the book and purchase the book over the internet. There is usually a difference in an object included in a dictionary image set stored in an image database and an image of a target search object taken by a device, even an object on a page in a book. This is due to conditions such as size, shooting angle, and lighting. Practical search accuracy cannot be achieved simply by matching image bit maps.
One technology used to match objects with a high identification rate, even when there are differences due to conditions such as size, shooting angle, and lighting, is an object recognition technique using rotation- and scale invariant local image features called keypoints. In the technique known as Scale-invariant feature transform (SIFT), differences in the output from filters with adjacent scales are extracted, and image sets known as “Difference of Gaussians” (DoG) are obtained. Coordinates at which the absolute values in a DoG image are at their maximum in both the spatial direction and scale direction are called keypoints. A plurality of keypoints is usually detected in an image with gray scale patterns. The orientation of the keypoints is determined from the density gradient of the pixels surrounding the keypoints, and the maximum scale of the DoG is used as the keypoint scale. The pixels surrounding keypoints are divided into 16 blocks, and a gray scale histogram of the pixels in each block is extracted for use as a feature value of the keypoints. In SIFT, feature values are expressed as 128-dimensional vectors including real number elements. SIFT has an established reputation as a robust object recognition technique with respect to rotation and scaling variance. However, because of the large quantity of calculations, a brute-force method is used to match the keypoints in the large-scale image database with keypoints in the target search image. As a result of the enormous number of calculations required, practical search times are difficult to achieve.
It is an object of the embodiments described herein to provide a technique for searching large-scale image databases to search for objects at high speeds, and to overcome the problems associated with the prior art by using feature values to achieve reduced search times.