Among services provided by the Internet such as keyword search (including content provision of aggregated news and information, for example), document content may be provided according to the relevant geographical information found in the documents. However, with these services provided on the Internet, existing technologies merely extract geographical information that appears within a document when categorizing the document according to its geographical information. Different geographical information may however appear in the same document, and all geographical information may be extracted from the document without differentiation. In reality, core geographical information generally exists in the descriptive content of a document. For example, among the news reports of Sichuan earthquake, the location name “Sichuan” may be extracted as geographical information. At the same time, the news may also mention donations from other provinces and cities to Sichuan. Using an existing method, such geographical information as Guangdong and Beijing may also be extracted. Merely judging from the geographical information extracted this way, the document content might seem as if the news or the information had happened in another place, such as Beijing or Guangdong. At the same time, core geographical information of these news reports should really be recognized as “Sichuan” in this case.
In other words, because existing technologies merely extract geographical information that appears within a document on its face value, multiple pieces of geographical information may be extracted without discerning the true core geographical information of the document from among these pieces of geographical information. This may lead to an inaccurate result for services that are based on the extracted geographical information, such as content provision based on search, and geographically aggregated news and information.