A. Field of the Invention
Systems and methods described herein relate to search engines and, more particularly, to techniques for classifying text as relevant to geographic regions.
B. Description of Related Art
The World Wide Web (“web”) contains a vast amount of information. Locating a desired portion of the information, however, can be challenging. This problem is compounded because the amount of information on the web and the number of new users inexperienced at web searching are growing rapidly.
Search engines attempt to return hyperlinks to web pages in which a user is interested. Generally, search engines base their determination of the user's interest on search terms (called a search query) entered by the user. The goal of the search engine is to provide links to high quality, relevant results (e.g., web pages) to the user based on the search query. Typically, the search engine accomplishes this by matching the terms in the search query to a corpus of pre-stored web pages. Web pages that contain the user's search terms are “hits” and are returned to the user as links.
In an attempt to increase the relevancy and quality of the web pages returned to the user, a search engine may attempt to sort the list of hits so that the most relevant and/or highest quality pages are at the top of the list of hits returned to the user. For example, the search engine may assign a rank or score to each hit, where the score is designed to correspond to the relevance or importance of the web page.
Local search engines are search engines that attempt to return relevant web pages within a specific geographic region. When indexing documents for a local search engine, it is desirable to be able to, when appropriate, automatically associate documents, or sections of documents, with specific geographic regions. For example, a web page about a restaurant in New York City should be associated with New York City. In many cases, geographically specific web pages include postal addresses or other geographic information that unambiguously associates the web page with the geographic region. In other cases, however, the web page may be related to a specific geographic region but yet may include only partial postal address information or include other terms that may not be easily recognized as being associated with a specific geographic location. This makes it difficult to determine the geographic region with which the web page is associated.