Social platforms (e.g., Twitter) are popular for sharing activities, thoughts, and opinions. Geotagging of social messages (e.g., tweets) enables applications to personalize a user's experience based on location information. However, due to privacy concerns, only a small percentage of users choose to publicize their location when they post social messages, and others only occasionally reveal the location of some of their social messages.
Inferring location of social messages (e.g., tweets) has emerged to be a critical and interesting issue in social media, since the proportion of geotagged social messages (e.g., tweets) is relatively low and the ones with specific venues associated with them are even sparser. It is a challenging problem due to the sparse usage of geo-enabled features in social media. For example, according to one study, less than 1% of tweets are geotagged. For non-geotagged tweets, the most explicit information that can be used for location inference is the textual content of tweets, which can mix a variety of daily activities (e.g., food, sports, emotions, opinions) without clear location signals. Tweets are usually short and informal, implying that traditional gazetteer terms may not be present in the vocabulary of the tweets at all. Even if proper place names are contained in tweets, it can still be a tough problem, especially for chain stores. For example, there may not be a significant difference between content of tweets that are associated with the Starbucks at Berkeley and the Starbucks at Stanford. Therefore, it is not easy to tell from the content of a tweet which branch store the tweet was posted from.
Inferring the location of non-geotagged social messages (e.g., tweets) can facilitate better understanding of users' geographic context, which can enable better inference of a geographic intent in search queries, more appropriate placement of advertisements, and display of information about events, points of interest, and people in the geographic vicinity of the user. Conventional systems and methods on modeling locations in social networks can be roughly categorized into two groups based on the techniques used for geo-locating: content analysis of social messages (e.g., tweets), and inference via social relations of users. Depending on the objects being predicted, different systems and methods focus on inferring the locations of users or individual social messages (e.g., tweets).
Another inadequacy of conventional systems and methods is that most existing systems and methods infer the location of a user or a social message (e.g., tweet) at a coarse level of granularity, ranging from country, state, to city levels, which may not be good enough to identify potential recipients for location-driven advertising. Thus, identifying the location of a social message (e.g., tweet) at a finer level of granularity is needed.
However, inferring location at a finer level (e.g., at geographic venues level) for social messages (e.g., tweets) is a difficult and challenging task. Other than location-based services (e.g., Foursquare) that explicitly let users choose a point of interest/venue for their checkins, most social media applications on mobile devices (e.g., Twitter or Instagram) provide geotagging in the form of associating a latitude-longitude pair with a social message (e.g., tweet) and/or a photo.
Additionally, geotagging in the form of coordinates may not always be very precise, especially within a confined geographic area. For example, it can be ambiguous to determine from geotags whether a social message (e.g., tweet) was posted at an Apple Store or a Starbucks next door. Hence, creating a one-to-one correspondence between latitude-longitude pairs and POIs/venues is not trivial. The problem becomes even harder in scenarios where users post social message (e.g., a tweet or Facebook post) about food on the way home after they have explored a good restaurant, although it would be desirable to associate such social messages (e.g., tweets) with the restaurant. Therefore, geotags for some social messages (e.g., tweets) are inherently noisy, in terms of their practical usability.