In order to provide more relevant search results to users, it is useful to make a determination of the geographical area to which a particular publication or document is relevant. When crawling the Internet in a targeted manner for locally relevant content, this is a key qualifier. Due to the enormous amount of content present on the Internet, most of which is non-local, a targeted web crawler which does not make such a determination will waste correspondingly enormous amounts of resources crawling non-local content.
Current solutions to this problem often rely on a comprehensive toponymic database, attempting to identify and disambiguate references to places with respect to the database. This is a very difficult task, given the complexity of and variation among natural language place references, and can require significant processing. In the context of a web crawl, it is important to minimize the amount of processing required for each document, since a crawler necessarily will encounter very many documents. Through applied effort, ingenuity, and innovation, solutions to improve such methods have been realized and are described in connection with embodiments of the present invention.