A dataset is a collection of information. Some datasets associate the collected information with location information, and may be referred to herein as location-based datasets. One example of a location-based dataset is a collection of digital photographs publicly available at a web site such as flickr.com, which is a public photo sharing site. At such web sites, persons who submit photos can add short contextual labels or “tags” to their photos such as the place at which the photo was taken, in addition to any device-generated context information. These labels help others search for and locate the photo. These tags are often related to the photo in some way, such as by subject, time, location (geo-reference information), etc. Another example of a dataset is a collection of words used in Short Message Service (SMS) messages (i.e. cellular telephone text messages), perhaps those sent or received from a certain geographic area. Other examples of datasets include restaurants searched by users on Yahoo! Local, search results or search terms, news reports and other location-annotated information. In general, although different datasets may include diverse types of information, each item in any dataset sharing some commonality with other elements or information in the dataset.
Mobile devices, including, but not limited to, mobile phones and digital cameras, represent just one example of devices capable of generating location-dependent datasets. Such mobile devices are capable of determining or capturing information capable of at least partly defining the context of the environment in which they are used. For example, in addition to temporal information such as time and date, location services may be used to supply spatial information about the device such as latitude and longitude from a Global Positioning System (GPS) device. Such spatial information may be associated with messages, digital photographs or other content sent, received or otherwise captured by a mobile device, thereby providing spatial context for the content.
Some context information, typically time and date information, may be automatically generated by a device (referred to generally as device-generated data) and associated with the content. For example, a mobile phone may automatically generate time and date metadata associated with the capture of a digital photograph or the transmission of a Short Message Service (SMS) text message. Additionally, subjective semantic information may be entered by a user at the time of capture (or later) and associated with the content.
While location-based datasets have the potential to provide useful information, often the raw information in a dataset is often too voluminous, complex, disorganized and generally hard to understand and grasp as a whole. Therefore, it would be generally beneficial to derive only the information of most interest to the user from a dataset, by identifying or deriving keywords from the dataset selected in accordance with some metric or relevance parameter representing, for example, the most popular or frequently occurring tags associated with the information in the dataset.
U.S. application Ser. No. 11/437,344, filed on May 19, 2006 and entitled “Summarization of Media Object Collections,” and U.S. application Ser. No. 11/593,668, filed on Nov. 6, 2006 and entitled “Context Server for Associating Information Based on Context,” the contents of which are incorporated herein by reference, in some embodiments describe an algorithm for identifying or deriving keywords from a dataset in accordance with their popularity or some other metric. As described in these applications, a context server may receive stored datasets containing one or more of context information, media objects, and user-generated subjective characterization information associated with the context information from one or more users. The context information in these datasets may include metadata that relates to time, date, location, ambient conditions, aperture, shutter speed, and other settings (for digital photos), biometrics (e.g., relating to the user of the device), device/user identifications, geographical reference data (e.g., from a GPS location device, cell identification, or other location technology), or combinations thereof. The user-generated subjective characterization information may include one or more tags for a media object, such as a title, a description of the image, annotations, geographical location and comments.
The context server associates service information with the user-generated subjective characterization information based on the context information. The service information may include, but is not limited to, keywords identifying popular or heavily searched points of interest, locations (e.g. streets, towns, cities, and the like), restaurants, people, advertisements, popular words in context-based messages, sponsored search advertisements, or combinations thereof. Along with these keywords or labels, the service information may include other values associated with each label and obtained from the context information and the user-generated subjective characterization information, such as a latitude, longitude, a time or date or range for which the information is valid, and one or more associated values such as a value indicating the popularity of the label.
However, even with service information (e.g. popular keywords) derived from the dataset, it may still be difficult for a user to geographically relate with those keywords and place them in context with his/her surroundings. In other words, consistent with the phrase “A picture is worth a thousand words,” users may derive a certain level of benefit and comfort from being able to compartmentalize the keywords by visually seeing a representation of a location associated with these keywords and how the keywords physically relate to the world, thereby providing a geographic context to aid in understanding them.
Therefore, it would be advantageous to visualize keywords derived from datasets, including “arbitrary” keywords normally thought of as being not intrinsically linked to any particular geographical feature (e.g. movies, people, words in SMS messages), by enabling a user to see those keywords on a map with some further indication of the extent of their popularity, relevance or relationship to some metric.
Conventional systems for relating keywords to geographic locations include conventional maps, which show larger cities in larger text. FIG. 1 is an illustration of an exemplary conventional map of San Francisco, Calif. and surrounding areas, showing a few landmarks and cities whose text varies according to their size (e.g. Oakland is in a larger font than Alameda). Another conventional system is Gutenkarte (see http://www.gutenkarte.org/), which uses a map to visualize geographic locations mentioned in books, with the geographic location names being shown in various sizes according to the number of times they were mentioned. Stanley Milgram introduced the idea of displaying landmarks sized in accordance with their popularity in 1976. However, these conventional systems only display location-dependent objects (e.g. cities and/or landmarks) tied to a particular geographic region, and none or these systems display keywords not intrinsically linked to any particular geographical feature (e.g. movies, people, words in SMS messages).