A. Field of the Invention
The present invention relates generally to networks, and more particularly, to geolocation information associated with resources on a network.
B. Description of Related Art
The World Wide Web (“web”) contains a vast amount of information. Locating a desired portion of the information, however, can be challenging. This problem is compounded because the amount of information on the web and the number of new users inexperienced at web searching are growing rapidly.
Search engines attempt to return hyperlinks to web documents in which a user is interested. Generally, search engines base their determination of the user's interest on search terms (called a search query) entered by the user. The goal of the search engine is to provide links to high quality, relevant results to the user based on the search query. Typically, the search engine accomplishes this by matching the terms in the search query to a corpus of pre-stored web pages. Web documents that contain the user's search terms are “hits” and are returned to the user.
Some web documents may be of particular interest to users that reside in certain geographical areas. For example, web documents associated with an on-line newspaper may be of most relevance to the geographical area covered by the newspaper. Web documents associated with local businesses or organizations are additional examples of web documents that may be of particular interest to a geographical area. Thus, it can be desirable for a search engine to know whether a web document has geographical significance and when it does, the geographical locations associated with the web document.
Web documents often include postal addresses. In some situations, the postal addresses may help to define the geographical relevance of the web document. More specifically, a postal address can be converted to a geographic coordinate (e.g., latitude and longitude) value. The geographic coordinate can be used to calculate the distance between two locations. In the context of web documents and web searching, a geographically distant web document may be determined to be less relevant than a closer web document.
Extracting valid addresses from web documents and efficiently geocoding them (i.e., converting the address to geographic coordinate values) can be a difficult problem. Extracting postal addresses can be difficult because addresses can be written in a number of different formats and may not even be complete addresses. The zip code, for example, may be omitted from an address.
In addition to extracting valid postal address information, accurately and efficiently geocoding the postal addresses can be difficult. Ideally, the geocoding should be able to handle all postal addresses, produce geographic coordinate information that is as close to the actual address as possible, and be able to quickly generate the geocode information.
Accordingly, there is a need in the art to be able to associate documents with geographic locations by efficiently extracting and geocoding postal addresses.