Social platforms (e.g., Twitter) are popular for sharing activities, thoughts, and opinions. Geotagging of social media messages (e.g., associating a physical location or venue with a tweet) enables applications to personalize a user's experience based on location information. However, due to privacy concerns, only a small percentage of users choose to publicize their location when they post social media messages, and others reveal the locations of their messages only occasionally.
Because only a small proportion of social media messages are explicitly geotagged to a location, inferring locations of social media messages based on other information (e.g., content of the messages) can be useful. For example, according to one study, less than 1% of tweets are geotagged. For non-geotagged messages, some applications infer location based on the textual content of messages. However, messages can mix a variety of daily activities (e.g., food, sports, emotions, opinions) without clear location signals. In addition, many social media messages (e.g., tweets) are short and informal, so clear geographic terms may not appear in the content at all. Even if proper place names are included, it can still be difficult to identify a specific location, especially for chain stores. For example, there may not be a significant difference between the content of tweets that are associated with a Starbucks site in Berkeley versus at a Starbucks site at Stanford. Therefore, it is not easy to tell from the content of a tweet which branch store the tweet was posted from.
Inferring the location of non-geotagged social media messages can facilitate better understanding of a user's geographic context, which can enable better inference of a geographic intent in search queries, more appropriate placement of advertisements, and display of information about events, points of interest, and people in the geographic vicinity of the user. Conventional systems and methods for identifying geographic locations corresponding to social media messages can be roughly categorized into two groups based on the techniques used for geo-locating: (1) content analysis of the social media messages; and (2) inference based on social relations of users. Some systems focus on inferring the locations of the users, whereas other systems focus on inferring the locations associated with individual social media messages.
One problem with location inferences is that not all social media messages are associated with a location or venue. Given a social media message that is not geotagged, some applications compute a probability for each of a plurality of venues, and estimate the correct venue as the one (or ones) with the highest probability. Unfortunately, this technique can incorrectly associate a social media message with a venue when the message should not be linked to any venue at all.