A detailed description of the current state of the art in geocoding is provided in ‘The Geocode Encyclopedia: A Research for Geographic Entity Object code Systems and Evaluation Perspectives version 1.1’ by Naoki Ueda.
The proliferation in recent years of wireless portable devices has increased productivity and convenience. One of these devices has been the portable satellite navigation system (satnav) which uses Global Positioning Satellites system (GPS), electronic maps, and databases stored within the satnav device, and mathematical algorithms to allow a user to identify their location and to determine navigation directions to a desired location.
Geocoding is the process of assigning a latitude/longitude pair to a textual description (frequently an address) of a location. The latitude/longitude pair that identifies a particular location is called a geocode. Geotagging is the process of assigning a code to a particular location (normally identified by a latitude longitude pair of coordinates). The code and the latitude/longitude coordinates together are known as a geotag.
The best known system of identifying a specific location is the latitude-longitude coordinate system which identifies a location on the Earth by latitude and longitude coordinates. This system can specify the location of any point (a latitude and longitude coordinate specific to a particular location) on the Earth, to any degree of accuracy depending upon how precisely the coordinates are identified. Five decimal places of latitude or longitude translate to an accuracy of about 1 meter at the equator.
The majority of existing satnav systems locate an address in one of three ways:
by using a rough estimation of the exact location of the address (the least accurate method);
by using an calculated estimation of the exact location of the address (an intermediate accuracy method); or
by point addressing (the most accurate method).
As those skilled in the art are aware, point addressing is the identification of an actual set of latitude and longitude coordinates for a given specific location.
The method used to locate the address will be determined by the level of detail provided by the map supplier; TeleAtlas (a Dutch company which is a wholly owned subsidiary of TomTom, a Dutch company located in Amsterdam The Netherlands) and NAVTEQ (headquartered in Chicago, Ill. and a wholly owned subsidiary of Nokia Corp.) are the two major suppliers of maps for satnav systems. Point addressing data is becoming increasingly available.
The least accurate method to locate an address is by a rough estimation of the address. For some roads or sections of roads, the only information the map supplier has is the latitude and longitude of the starting point of the road and the ending point of the road. In this case regardless of which address is asked for, the user is always directed to the mid-point of the road by the satnav system.
A method with intermediary accuracy is to locate an address by an estimation of the exact location of the address. A map supplier provides a map database which includes coordinates for roads and critical points of variations in the road. The critical points include such items as start points, end points, radii of curvature, center points of radii of curvature, inflection points, and the like. These variations in the road include items such as where a cross street is intersected, where the road is straight, where the road curves, and the like. Each different segment of the road (straight, curved, and the like) will be considered a separate section of the road. For each section of road, the map database will include a street number and the latitude and longitude for both endpoints of the section and an identification of which side of the street addresses are identified with even street numbers and on which side of the street addresses are identified with odd street numbers.
For addresses located within a section of the road for which the map database does not have the exact coordinates of an address, the satnav system will make estimation. As an example, a desired address is 300 Main Street. The database does not contain the exact latitude and longitude of this address. However, the database contains the information that Main Street is 900 meters long and starts at Point A (specified by latitude and longitude), with house numbers beginning with 100, and ends at Point B (specified by latitude and longitude) with house number 160, 60 house numbers away. For a road nine hundred meters long, with 60 house numbers, 30 assumed on each side of the road, it is assumed that each house number pair (corresponding even and odd numbers) is 30 meters apart.
In this example, the satnav system assumes that all house numbers are spaced equally along the road and that there are an equal number of house numbers on each side of the road. So the house number 140 is assumed (calculated) to be 66.7 percent of the way along this stretch of road. The stretch of road from Point A to Point B is 900 meters long. Thus, the address is calculated to be 66.7 percent of 900 meters along the road—600 meters down the road from Point A. The satnav system will guide the user desiring to reach number 140 to this location. There are obvious errors to this system, the main one being that the house numbers are probably not evenly distributed along the road. This is why the satnav system does not always guide a user to exactly the “front door” of the address. However, in most cases the satnav system gets the user “close enough” for the user to be satisfied with the system.
The most accurate method to locate an address is by using “point addressing.” In this method, a map supplier will provide the actual latitude and longitude location of each address. This is a more expensive means of identifying each address relative to other address estimation methods, as each address will need to be individually identified and located. In many cases this has to be done by actually driving along the road with a specially equipped vehicle and marking what the latitude and longitude coordinates are of each address as a plotter drives past the address. This addressing method may be expanded out to entire building campuses. An example would be NAVTEQ “micropoint” addresses provided by the map supplier NAVTEQ.
NAVTEQ typically provides two distinct locations for each address, a point on the road in front of a building located at the particular address and a point that is the centroid of the roof of the building for the address. Typically NAVTEQ addresses provide the latitude and longitude of the location/address to five decimal places, an accuracy of about 1 meter at the equator.
Understanding Administrative Divisions
For purposes of developing and using satnav systems, developers will consider national and subnational entities at various Administrative Levels, First Level, Second Level, and so forth. Administrative Divisions are grouped into various Administrative Levels which are normally established for the purpose of government. A subnational entity is a portion of a country or political jurisdiction.
Countries are divided up into smaller Administrative Divisions to make managing their land and the affairs of their people easier. For example, a country may be divided into provinces, which in turn are divided into counties, which in turn may be divided in whole or in part into municipalities. The first division of a Country into smaller subnational units, e.g. states in the United States of America or provinces in Canada, is considered to be a First Level Administrative Division. First Level Administrative Divisions are divided into Second Level Administrative Divisions, such as counties or municipalities, which are further sub-divided into Third Level Administrative Divisions such as towns and hamlets.
For the purposes of brevity in this document a Second Level Administrative Division will be referred to as L2AD.
Each country in the world has its own system of naming and creating Administrative Levels. Even the most underdeveloped countries of the world have quite well defined Administrative Divisions. Normally this is a hold-over from when these countries were colonies of more developed nations.
For purposes of explanation a detailed examination of the three countries that make up North America (Canada, United States and Mexico) is shown below. Table 1 summarizes the different naming and division systems for these three countries.
Canada
Canada is divided into ten provinces and three territories.
Each province has a unique system of local government which may include upper-tier or rural jurisdictions such as counties (or Municipal districts), regional municipalities, regional districts or regional county municipalities, and lower-tier or urban jurisdictions such as cities, towns, villages, townships, and parishes. Cities in Quebec are further subdivided into arrondissements (“boroughs”).
Statistics Canada of Canada aggregates statistical census data into census divisions, which follow boundaries of one or more large local government units. Municipalities (and in some cases, communities within municipalities) within census divisions may be considered census subdivisions.
United States of America
The 50 states (four of which have the official title of Commonwealth) are subdivided into counties (Louisiana uses the title “parish” and Alaska uses the title “borough”). The counties may be further subdivided into townships, or towns in New York State and most of New England. Urban areas of a state may be organized into incorporated cities, towns, villages, and other types of municipalities, and other autonomous or subordinate public authorities and institutions.
Although the District of Columbia is not a state it is considered a First Level Administrative District.
Mexico
The United Mexican States (Spanish: Estados Unidos Mexicanos) is a federal republic formed by 32 federal entities (Spanish: entidades federativas) (31 states and 1 federal district).
Within the federal states there are approximately 2,400 municipalities and within the federal district of Mexico (this is where Mexico city is located) there are 16 boroughs (delegaciones).
TABLE 1Administrative DivisionsCountryLevel 1Level 2Level 3Canada10 provincesCountiesMunicipalities3 territoriesCensus divisionsParishesregionsCounty subdivisions(depending on the(depending on province or territory)province or territory)United50 states3,000+ countiesFor purposes of this documentStates of1 federal40+ independentPublic Use Microdata AreasAmericadistrictcities(PUMAs) and Super-PUMAs will be(District ofused as Level 3 AdministrativeColumbia)Divisions (L3AD). These areunincorporatedsubdivisions of counties createdorganizedand maintained by the US CensusterritoriesBureau. PUMAs and Super-PUMAscan be used to divide counties(greater than 1,800,000). There areonly 15 counties in USA with apopulation greater than 1,800,000.Mexico31 federal2400+Municipalities are divided intostatesmunicipalitiesBoroughs (delegaciones)(estados)(averageMexicopopulation ofCity45,61616 boroughs(delegaciones) inMexico CityExplanation of Geocoding Terminology
In order to explain clearly geocoding, geotagging and the like, it is necessary to establish some clear definitions for the use of such terms in this document. Many of the terms are frequently used in different ways, even by reputable and recognized organizations.
To geocode (verb) is to transform descriptive text into a set of latitude and longitude coordinates. Examples of descriptive texts are:                An Address: Jefferson Building                    600 Delany Street            Alexandria, Va. 22314-5796            USA                        A descriptive location: “the main entrance to the Jefferson Building at the USPTO complex in Alexandria Va.”        
A geocoder (noun) is a set of inter-related components in the form of operations, algorithms, and data sources that work together to produce a geocode.
A geocode (noun) is a set of latitude and longitude coordinates for an address or other descriptive location. The length of the geocode may vary and the accuracy depends on the number of decimals places.
For example: +38.802133, −77.063371 is the geocode for “the main entrance of the Jefferson Building, part of the USPTO, in Alexandria Va. (example only). The latitude and longitude are given to 6 decimal places, an accuracy of approximately +/−8.5 cm.
The majority of addresses in first world countries have already been geocoded.
A geotag (noun) is a code and corresponding pair of latitude and longitude coordinates that can be used to identify and to locate a point on the earth's surface, typically an address or other textually described location. The code is frequently an alphanumeric string. This string may vary in length depending on the coding method used.
To geotag (verb) is the process of assigning a code and a corresponding pair of latitude and longitude coordinates to a particular location, typically an address or a Point of Interest (e.g. the statue of Liberty in New York City)
A geocode (a latitude and longitude pair) given to 5 decimal places (example: +40.12345) gives an accuracy of about 1 meter at the equator. This is the level of accuracy that is commonly found in the data provided by the major suppliers of maps and geocodes for satnav devices.
A latitude/longitude pair with this level of accuracy requires a minimum of 14 characters and a maximum of 18 characters, depending on where on the globe the point is located. Such large numbers of characters are difficult for humans to work with, prone to errors and are not “user friendly”.