Geocoding is the process of converting a named location to a set of geographic coordinates. The named location is usually an address such “1 Oracle Drive, Nashua, N.H. 03062” and the coordinates are usually longitude and latitude. Other examples of named locations are places of interest such as “Logan International Airport, Boston” or “Pheasant Lane Mall, Nashua.”
A typical geocoder includes a geographic base data source and algorithms that utilize this data source. The data source includes geographic names and street segment information. Geographic names are typically named locations, such as points-of-interest (e.g., names of businesses, shopping complexes, parking garages, government offices), and administrative or political areas (e.g., cities, postal codes, counties, or well known areas such as Midtown Manhattan, Chinatown San Francisco). The street segment information includes data such as the official and alternate street names (e.g., “Main Street”, “Daniel Webster Highway”, and “Rt. 3”, all of which designate the same road), the address number format (e.g., numeric as in “1 Oracle Drive”, or alphanumeric as in “256A D W Highway”), address ranges (e.g., odd numbers ranging from 1 through 99 on the left and even numbers ranging from 2-100 on the right), and the geographic coordinates for the end points of the street segment.
The data can also include details of the hierarchy of political or other administrative boundaries. The four-level hierarchy of country, state, county and city, for example, is valid for the U.S. In Europe, some countries have a five-level hierarchy, namely country, province, district, municipality, and settlement.
Different data suppliers (e.g. TeleAtlas, NavTech, and Geographic Data Technologies) have different levels of detail and area of coverage for their respective geographic databases. In addition, each country and sometimes regions within countries have different guidelines for specifying postal addresses. Thus currently available geocoding software is tightly bound to a specific geographic database and its proprietary format. This format typically includes specialized indexes that speed up the search and retrieval of street and places names from the database.
The search process is outlined in the following highly simplified example. To find “500 Oracle Pkwy, Redwood City, Calif.,” geocoding software must search a list of state names or abbreviations until it finds “CA”. Then, under “CA”, it must find the city “Redwood City”. Finally it must search through street names until it finds the name “Oracle” and street type “Parkway”. The amount of data scanned mandates many highly efficient indexes.