An unambiguous location reference can be uniquely located. An ambiguous location reference corresponds to more than one location. For example, the term “Springfield” by itself can refer to 30 different cities in the USA. While “Springfield” is an ambiguous location, “Springfield, IL, USA” is an unambiguous location reference.
The location disambiguation problem has been studied in the context of large textual documents. Benefiting from the broader context that can be derived from the neighboring text, prior methods for location disambiguation in the context of large textual documents often use ad-hoc rules.
For example, H. Li, R. K. Srihari, C. Niu, and W. Li (“InfoXtract Location Normalization: A Hybrid Approach to Geographic References in Information Extraction,” Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, Volume 1, Association for Computational Linguistics, 2003, pp. 39-44) assume “one sense per discourse,” in that subsequent mention of a location reference can be identified with the first unambiguous location reference.
Similarly, G. Andogah, G. Bouma, J. Nerbonne, and E. Koster (“Geographical Scope Resolution,” Methodologies and Resources for Processing Spatial Language, 2008) use spatial proximity as a criterion for disambiguating ambiguous location references, which assumes that “places of the same type or under the same administrative jurisdiction or near/adjacent to each other are more likely to be mentioned in a given discourse.”
Alternatives to these rule-based approaches are topological and ontological based approaches, which benefit from certain inherent properties of an unambiguous location reference.
For example, an unambiguous location reference is a composite of various geographic entities (e.g., city, state, country, etc.) that are present at different geographic scales. In the example of “Springfield, IL, USA,” there are three geographic entities, which are “Springfield,” “IL,” and “USA.” These entities exhibit a containment relationship in a way that the city Springfield is located in the state of IL and IL is located in the USA.
Based on this insight, V. Sengar, T. Joshi, J. Joy, S. Prakash, and K. Toyama (“Robust Location Search from Text Queries,” Proceedings of the 15th Annual ACM International Symposium on Advances in Geographic Information Systems, ACM, 2007, pp. 1-8) formalize the problem of location disambiguation as the challenge of finding a region that is a spatial intersection of all the geographic entities present in a given location reference. For example, consider the location reference of “Mission Street, San Francisco.” This location reference has two parts, i.e. “Mission Street” and “San Francisco,” which have separate representations in a geographic database. Each of the above entities will lead to many possible geographic entities. However, in the best-case scenario, the spatial intersection of all these possible entities will only lead to a single intended region.
B. Martins, M. J. Silva, S. Freitas, and A. P. Afonso (“Handling Locations in Search Engine Queries,” Proceeding of the 3rd ACM Workshop on Geographic Information Retrieval, 2006) and R. Volz, J. Kleb, and W. Mueller (“Towards Ontology-Based Disambiguation of Geographic Identifiers,” Proceedings of 16th International Conference on World Wide Web, 2007) use approaches that model the location disambiguation problem as one of finding a single branch that contains all the geographic entities mentioned in a given location reference and thus use geographical ontology, instead of topological operations, to find the intended location. G. Fu, C. B. Jones, and A. I. Abdelmoty (“Ontology-Based Spatial Query Expansion in Information Retrieval,” In Lecture Notes in Computer Science, Volume 3761, On the Move to Meaningful Internet Systems: Odbase, 2005, pp. 1466-1482) and T. Kauppinen, R. Henriksson, R. Sinkkilä, R. Lindroos, J. Väätäinen, and E. Hyvönen (“Ontology-Based Disambiguation of Spatiotemporal Locations,” Proceedings of the 1st International Workshop on Identity and Reference on the Semantic Web (IRSW2008), 5th European Semantic Web Conference, 2008) provide further examples.
There are several challenges with the topological and ontological approaches. One challenge is that the operation of breaking a location reference into its constituent geographic entities is a combinatorial explosion problem. I. Jenhani, N. B. Amor, and Z. Elouedi (“Decision Trees as Possibilistic Classifiers,” International Journal of Approximate Reasoning Vol. 48, No. 3, 2008, pp. 784-807) address this challenge to some extent.
However, there is another challenge when location references are incomplete (such as “San Jose” or “Springfield”), such that there are multiple spatial regions or ontological branches corresponding to a location reference. Some approaches use ad-hoc heuristic rules to rank these competing locations, such as in R. Volz, J. Kleb, and W. Mueller (“Towards Ontology-Based Disambiguation of Geographical Identifiers,” Proceedings of 16th International Conference on World Wide Web, 2007), which rank competing ontological branches based on certain empirically decided weights that are based on structural features (such as feature class, population, etc.) that are available in a geographic gazetteer.
An alternate to such rule-based approaches is in D. A. Smith and G. S. Mann (“Bootstrapping Toponym Classifiers,” Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References—Volume 1, Association for Computational Linguistics, 2003, pp. 45-49), which proposes a data-driven place name classifier that is trained and tested on news articles and historical documents.