Along with enhancement in processing ability of a server device and a terminal device such as a personal computer, a PDA or a mobile phone, and progress in computer networks and the Internet environments, a way for providing visual information such as moving image or still image becomes more diversified. For example, in the Internet, a concept of linking evolves, wherein, through an operation of clicking a text or image arranged on a Web page in a linkable manner, it is possible to transfer to another related Web page and download a related file. When a link destination is an audio file or a moving image file, it can be reproduced in real time (e.g., streaming playback) according to need.
Recent years, map information providing services for providing information about a geographical position is becoming more advanced. For example, there is a map information providing service disclosed in the following Patent Document 1. This service is based on an apparatus for use in displaying a digital map. The apparatus comprises: means for creating a set of map tiles relating to a map prepared from digital map data; means for interpreting candidate location data received from a client, wherein the candidate location data includes location information; means for determining the location information from the candidate location data; and means for providing, to the client, a requested map tile relating to the location information.
In 2005, the applicant of the Patent Document 1 started to provide a virtual glove Web service, so-called “Google Earth” (in Japan, http://earth.google.co.jp/). This service is intended to synthesize, on a server, a photographic image from a satellite around the earth, and provide the synthesized image to a user, wherein a user accessed from a terminal device can view satellite images around the world as if she/he looks a globe while rotating it. In an area where high-resolution images are available, it is possible to descend (zoom in) to an altitude of several meters, which allows a user to have an experience something that she/he browses aerial photographs. In 2007, Google Inc. also started a Web service, so-called “Google Street View” (in Japan, http://www.google.co.jp/help/maps/streetview/). This service is intended to synthesize, on a server, a 360-degree panoramic image collected by running, around a town, a large number of automobiles each having a camera mounted on a roof thereof, and provide the synthesize image to a user. A user accessed from a terminal device can move along a street back and forth and around, with the panoramic image, and have a simulated experience as if she/he drives an automobile around a town. The user can change the field of view from right to left or up and down, and can zoom in and zoom out. The “Google Street View” is configured to cooperate with the “Google Map” and “Google Earth” early provided by Google Inc.
Meanwhile, a research theme actively conducted along with progress in computer technologies includes an image recognition technology. Research on image recognition using a computer was started in 1960s. Since then, along with progress of high-speed processing technologies for computers and machine learning technologies, researches on line-drawing interpretation (1970s), a cognitive model based on a knowledge database constructed using a manually formulated rule and/or geometric model, and a three-dimensional model representation (1980s) were promoted. In the 1990s, researches, particularly, on facial image recognition and learning-based recognition, became active.
Researches on image recognition evolved from facial image recognition into generic object recognition. In the 2000s, further enhanced computer abilities made it possible to handle an enormous amount of calculation required for statistical processing and learning processing, and thereby researches on generic object recognition were promoted. The term “generic object recognition” means a technology of causing a computer to recognize, as a generic name, an object included in an image acquired from a real-world scene. In the '80s, the formulation of rules and models was manually performed, whereas, in the 2000s where it became possible to process a great deal of data at high speed, great interest was shown in an approach based on statistical machine learning, which triggered the recent boom of generic object recognition. The generic object recognition technology makes it possible to automatically assign a keyword to an image, and classify and retrieve the image in accordance with its semantic. An ultimate goal is to realize a human's cognitive function by a computer (Non-Patent Publication 1).
The generic object recognition technology progressed by approach from an image database and introduction of a statistical probability method. A pioneering research during the progress includes a method of leaning a correlation from data created by manually assigning keywords to an image to perform object recognition (Non-Patent Publication 2), and a method based on local feature values (Non-Patent Publication 3). Further, a research on generic object recognition based on local feature values includes the SIFT (Scale-Invariant Feature Transform) method (Non-Patent Publication 4), and the Video Google (Non-Patent Publication 5).
In 2004, so-called “Bag-of-Keypoints” or “Bag-of-Features” method was proposed. This method is intended to express an image by a histogram representing the frequency of appearance of a typical local pattern image piece, so-called “visual word”. More specifically, a histogram is created for each image by extracting feature points based on the SIFT method, and vector-quantizing an SIFT feature vector based on a plurality of pre-obtained visual words. The histogram created in this manner is formed as sparse vectors, such as several hundred to several thousand-dimensional vectors. Then, image recognition process is performed by handling these vectors as a classification problem of multidimensional vectors (Non-Patent Publication 6).