As the amount of available textual information grows, efficient and accurate methods to filter and organize the information are necessary. Generally, geographic data is of interest to individuals who care about things related to the places they live, work, or travel. Estimates indicate that 70% of documents include references to geographic locations.
However, the geographic data provided in these documents is often ambiguous due to the flexibility of natural language, and the data cannot be accurately geocoded. There are two common types of ambiguity, including geographic/non-geographic ambiguity and geographic/geographic ambiguity. The geographic/non-geographic ambiguity includes a word that can represent a location or a regular dictionary word or name. For example, Washington can represent the state or a person's name. The geographic/geographic ambiguity includes a word for a location, such as “San Francisco,” but fails to indicate which particular location for “San Franciso” is referenced, including San Francisco, Calif. or San Francisco, Nayarit, Mexico. Studies show that a single location name can have 4.4 different meanings on average, while 11.5% of the nouns in WordNet can have geographic interpretations. Also, a reader's interpretation of a written word, such as a location, can further exacerbate the ambiguity problem. For instance, a reader from Texas that reads a newspaper article about Paris may conclude that the term “Paris” refers to Paris, Tex., rather than Paris, France.
Therefore, a system and method for distinguishing locations and resolving any ambiguities is needed to efficiently and accurately map text phrases to geographical locations and provide coordinates for the locations.