When traveling to an unknown location, it is not unusual to be in an environment where one does not know his or her location. In recent years with the availability of global positioning systems (GPS), small hand held GPS receivers have appeared in the consumer market to help find one's location while visiting a strange location. Unfortunately, unless one is skilled in using a geographical map, a GPS receiver is not always user friendly especially in crowded downtown environments. Furthermore, one may know his or her general location, but may be interested in a specific object in his or her field of view.
A deictic (pointing) gesture together with an inquiring utterance of the form “What's that?” are common conversational acts utilized by a person when visiting a new place with an accompanying host. But alone, one must resort to maps, guidebooks, signs, or intuition to infer the answer. It would be desirable to have a user friendly device to help one know his or her location and further help one learn about an object in his or her field of view.
It has been observed that maps and tour books often lack detailed information and most people do not use them in everyday life, although most people carry a map when traveling to a new location. One interesting observation is the tendency of people to overstate the usefulness of a street map realizing they actually wanted to know more than what a map could provide, such as specific details about buildings and artifacts they were seeing around them. Typically, there are many specific questions asked by individuals, including requesting historic information and events, names of buildings, and makers of public artworks. It has been observed that two commonly asked questions are “where can I find xxx” and “what is this.” Often times, these questions are followed by requests for time-related information such as business hours and bus schedules. It should be appreciated, the information is needed “right here” and “right now”, or it is not worth the effort. Even when a mobile phone was available, it was unlikely to be used to call someone to ask for information. An exception to the latter was having an appointment to meet someone and needing to get the directions to the meeting location. It should be appreciated that location-based information services which provided access to a generic information service such as the world wide web, and which was initiated by a real-time query (e.g., “What is this place”) followed by a browsing step, would complement the users' experience in an unfamiliar setting and meet their needs for a location-based information service.
Web resources exhibit a high correlation between semantic relevancy and spatial proximity, an observation that has been noted and widely exploited by existing search technologies. Pieces of knowledge close together in cyberspace tend to be also mutually relevant in meaning. An intuitive reason is that web developers tend to include both text and images in authoring pages meant to introduce certain information. In practice, current web-image search engines, such as Google, use keywords to find relevant images by analyzing neighboring textual information such as caption, URL and title. Most commercially successful image search engines are text-based. The web site “www.corbis.com” (Corbis) features a private database of millions of high-quality photographs or artworks that are manually tagged with keywords and organized into categories. The web site “www.google.com” (Google) has indexed more than 425 million web pages and inferred their content in the form of keywords by analyzing the text on the page adjacent to the image, the image caption, and other text features. In both cases, the image search engine searches for images based on text keywords. Since the visual content of the image is ignored, images that are visually unrelated can be returned in the search result. However, this approach has the advantage of text search, semantically intuitive, fast, and comprehensive. Keyword-based search engines (e.g. Google) have established themselves as the standard tool for this purpose when working in known environments. However, formulating the right set of keywords can be frustrating in certain situations. For instance, when the user visits a never-been-before place or is presented with a never-seen-before object, the obvious keyword, name, is unknown and cannot be used as the query. One has to rely on physical description, which can translate into a long string of words and yet be imprecise. The amount of linguistic effort for such verbal-based deixis can be to involving and tedious to be practical. It should be appreciated that an image-based deixis is desirable in this situation. The intent to inquire upon something is often inspired by one's very encounter of it and the very place in question is conveniently situated right there.