Due to popularization of digital cameras and camera-equipped mobile phones, there is increased demand for using a camera not only for photographing a snapshot but also as an information input device. One of the possibilities to realize such demand is to recognize an object captured by a camera and to perform information processing, based on the recognized image.
It is still difficult to recognize an object having no limitation, however, thanks to technical development in recent years, the object recognition has been actualized since some limitation can be added to the object. For example, if it is possible that the object is not a three-dimensional object, but a pattern on a plane (planar object), and that an instance of the object (e.g., whether or not a photograph shows a certain model of car captured at a given angle) is to be recognized instead of a class of the object (e.g., whether or not the object in the photograph belongs to a category of cars), then the object recognition is already serviceable. Known as such examples are a service provided by Dai Nippon Printing Co., Ltd., which adopts technology of Clementec Co., Ltd. (US. Patent No. 20040208372), a service provided by Olympus Corporation, and a service provided by NEC Corporation which adopts technology of Evolution Robotics, Inc. If recognition of planar objects as above described is available, not only derivation from photographed posters or commodities, but also automatic indexing of existing images or videos can be achieved.
For the object recognition, features needs to be extracted from an image. The present invention focuses use of local descriptors in order to recognize a planar object. The local descriptor captures a local feature of an image, extracts the feature as a multidimensional feature vector, and then describes the local feature of the image. In this case, since values are determined locally, the local descriptor is relatively robust against occlusion and distortion of an image. Here, the word “local” implies a part of an image, and the “local descriptor” represents a partial feature of an image. In the present specification, the local descriptor is alternatively referred to as a feature vector.
In the object recognition method using the local descriptor, as a basic operation, distances between respective feature vectors obtained from two images are calculated, and nearest vectors are matched with each other. A feature vector in an image captured by a camera is matched with feature vectors in a large number of images in a database, and voting is performed with respect to the images in the database. Finally a label of an image having the largest number of votes is outputted as a “recognition result”. However, since the number of dimensions of a feature vector ranges from several dozen to several hundred, and the number of feature vectors ranges several hundred to several thousand per image, it is obviously not practical to simply calculate distances of all combinations.
However, thanks to the development of the nearest neighbor search techniques in recent years, it is possible to retrieve a vast number of feature vectors in a shorter time (e.g., see non-patent documents 1, 2). Particularly, an ANN (Approximate Nearest Neighbor) (e.g., see non-patent document 3), and an LSH (Locality Sensitive Hashing) (e.g., see non-patent document 4) perform approximate nearest neighbor searching by using a tree structure and a hash table, respectively, and realize fast retrieving. In Japan, in addition to the SR-Tree for accurate nearest neighbor searching (e.g., see non-patent document 5), distributed coding disclosed by Kobayashi et al. can be cited as the approximate nearest neighbor search technique (e.g., see non-patent document 6).
Further from the viewpoint of the object recognition, Wada et al. has proposed a notion of a nearest neighbor classifier (e.g., see non-patent document 7), and a technique called a KDDT which embodies the notion (e.g., see non-patent document 8). Suppose a case where each object corresponds to one feature vector and a category of the object is to be recognized. In this case, a category which includes a feature vector that is nearest to the feature vector obtained from the object needs to be identified, and a nearest neighbor feature vector need not be obtained. Accordingly, compared to a case where the accurate nearest neighbor retrieving is used, processing speed can be improved by several to several hundred times.
Further, a method for extracting features the method being adaptable to indexing of a document image, and a search algorithm adaptable to the features are known (e.g., see patent document 1).
Patent document 1: International publication No. 2006/092957
Non-patent document 1: P. lndyk, Nearest neighbors in high-dimensional spaces, Handbook of discrete and computational geometry (Eds. by J. E. Goodman and J.O'Rourke), Chapman & Hall/CRC, pp. 877-892, 2004.    Non-patent document 2: G. Shalchnarovich, T. Darrell and P. lndyk Eds., Nearest-neighbor methods in learning and vision, The MIT Press, 2005.    Non-patent document 3: S. Arya, D. M. Mount, R. Silverman and A. Y. Wu, “An optimal algorithm for approximate nearest neighbor searching, “Journal of the ACM, vol. 45, no. 6, pp. 891-923, 1998.    Non-patent document 4: M. Datar, N. lmmorlica, P. lndyk and V. S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, Proc. of the 20th annual symposium on Computational Geometry, pp. 253-262, 2004.    Non-patent document 5: Katayama Norio, Sato Shinichi, “Indexing Technique for Similarity Retrieval”, IPSJ Journalof Information Processing Society of Japan vol. 42, no. 10, pp. 958-964, Oct., 2001.    Non-patent document 6: Kobayashi Takao, Nakagawa Masaki, “Higher-dimensional Nearest Neighbor Search by Distributed Coding”, IEICET Technical report PRMU2006-41, Jun. , 2006.    Non-patent document 7: Wada Toshikazu, “Acceleration Method for Nearest Neighbor Classification based on Space Decomposition” IPSJ Journal vol. 46, no. 8, pp. 912-918, Aug. 2005.    Non-patent document 8: Shibata Tomoyuki, Kato Takekazu, Wada Toshikazu, “K-D Decision tree: An Accelerated and Memory Efficient Nearest Neighbor Classifier” IEICE Transactions (D-II), vol. J88-D-II, no. 8, PP. 1367-1377, Aug. 2005.