Geo-location detection from text is a difficult task. Detecting geo-locations from social data is further complicated by the prominence of hashtags, platform-specific lingo, lack of punctuation, capitalization, and proper grammar. Some of the main challenges in identifying locations accurately in social media postings include the following:
1) Lack of proper standards or heuristics: There are no definitive strategies for identifying locations in text, since they can be expressed in a variety of ways.
2) Ambiguous words: Ambiguous words, for instance names of locations that can also be names of people, are prominent.
3) Lack of standard grammar: Many social media users use informal and somewhat sub-standard language in their messages, and many social media outlets have their own lingo. This means that models that have been trained on standard English cannot perform well on social data.
4) Prominence of hashtags: Hashtags are used across many social platforms to indicate metadata related to a message, e.g. its topic. Over years of usage on social media, hashtags have taken a life of their own, interceding or succeeding a message with witty or creative tokens. On many occasions users mix more than one word to make a composite hashtag or express the location of an event via a trailing hashtag. In these instances, automated parsers are unable to break down the hashtags properly.
5) Consistency of self-identified user locations: Users can often choose to identify their location in their profile. For many social media platforms, this location does not need to be validated and can be expressed as free-text. This has led to the inevitable prominence of creative but non-viable locations.
6) Granularity of information: Some disaster-response teams, police and fire departments set up official social media accounts to report emergencies in real-time. The locations they identify in their messages are often specific to their location. For instance, “Injury wreck being reported on Hwy 183 NB at Loyola Ln. Back-ups toward MLK” includes a granular description of the address of an accident, which might be difficult to parse. Moreover the address might be difficult to locate, since a similar address or intersection might exist in many different cities.
7) Identifying the correct geo-coordinates: Even if words that refer to locations are accurately identified, sometimes they can be mapped to various geo-coordinates. For instance there are several cities named “Orlando” in the United States (e.g., in Florida, Oklahoma, West Virginia, New York, Virginia, Kentucky, North Carolina, and Arkansas).
8) Identifying the primary location of an event: Consider the message “Rebel Groups Supported By Turkey & US Reportedly Clash W/US-Backed Kurdish Group In Syria” which mentions three countries. It can be important to understand which location is where the event took place (i.e., Syria).
9) Timeliness & sustainability requirements: Even though machine learning models might yield good precision/recall numbers, they are often too slow to be applicable in real-time. In addition, since many of these models are trained on static training data, they will require periodic updates and adjustment.
Therefore, a system is desired that addresses all of the above challenges and provides a validated model against other geo-location services.