A. Field of the Invention
The present invention relates generally to web documents, and more particularly, to the geographical relevance of web documents.
B. Description of Related Art
The World Wide Web (“web”) contains a vast amount of information. Locating a desired portion of the information, however, can be challenging. This problem is compounded because the amount of information on the web and the number of new users inexperienced at web searching are growing rapidly.
Search engines attempt to return hyperlinks to web documents in which a user is interested. Generally, search engines base their determination of the user's interest on search terms (called a search query) entered by the user. The goal of the search engine is to provide links to high quality, relevant results to the user based on the search query. Typically, the search engine accomplishes this by matching the terms in the search query to a corpus of pre-stored web pages. Web documents that contain the user's search terms are “hits” and are returned to the user.
Some web documents may be of particular interest to users that reside in certain geographical areas. For example, web documents associated with an on-line newspaper may be of most relevance to the geographical area covered by the newspaper. Web documents associated with local businesses or organizations are additional examples of web documents that may be of particular interest to a geographical area. Thus, it can be desirable for a search engine to know whether a web document has geographical significance and when it does, the geographical locations associated with the web document.
One known approach to determining geographical relevance is to have humans manually classify web pages. For a large set of web documents, however, this approach can be labor intensive and expensive. Another known approach is to construct an automated parser to analyze the text associated with web documents. The parser may look for geographic terms, such as zip codes or telephone area codes in order to associate the web document with a geographic location(s). This approach can be problematic, however, as geographic terms are often used for web documents that are not necessarily relevant to a particular geographic area. For example, a national on-line retailer may have a specific mailing address but nevertheless be equally relevant to all geographical locations. Additionally, automated parsers can have difficulty finding and distinguishing geographic terms.
Yet another known approach to determining geographical relevance is to use the Internet Protocol (IP) address of the web server to locate the web document. A number of services are available for determining the location of a server based on the IP address. This technique, however, has the disadvantage that a web document may be hosted by a server at a location remote from the geographic relevance of the web document.
Accordingly, there is a need in the art to more effectively be able to determine the geographical relevance and location(s) of web documents, such as a web page.