A geocoding system is a software tool that is used to determine the geographic location for a particular address. A user inputs an address, and the system outputs the coordinates of the address, or perhaps provides a map showing the vicinity of the address.
Sometimes an exact location for an address is known within the system. Other times, algorithms are applied to provide a sophisticated best estimate based on the available data. An example of a geocoding system is described in U.S. patent application Ser. No. 11/317,503, titled GEOCODING AND ADDRESS HYGIENE SYSTEM EMPLOYING POINT LEVEL AND CENTERLINE DATA SETS, filed Dec. 22, 2005, assigned to the assignee of the present application, and incorporated by reference herein.
For a geocoding system to do its job properly, it is important that the initial address input be properly understood by the system. Input text must be parsed, or “made sense of” as an address before further analysis such as matching the input to a reference database of addresses, scoring the address match, and outputting results can occur. Parsing an input address means reducing a sequence of words composing an address line (like “123 Main Street”) into individual address elements (e.g., house number=“123”, street name=“Main”, and street type=“Street”). In different countries, and even within a single country, address lines differ by language, appearance of elements, order of elements, and delivery mode (such as P O Box, General Delivery, street address, Intersection, etc).
This goal of accurate parsing is complicated by various factors including the following: (1) there are many different valid address formats in a given country; (2) addresses can be written and abbreviated many different ways; (3) written segments, such as directional and ordinal elements (north, east, south, west, 1st, second, 100, . . . ), may be applicable to different address components; (4) input address may have errors or be incomplete; (5) depending on how it is parsed, an input address could refer to multiple actual addresses; (6) a single interpretation of an input address may refer multiple actual addresses; and (7) differences between valid written addresses for two distinct locations may be small.
To allow a geocoding system to understand the address being input, it can employ an address parsing program to analyze the input address so that the component parts are recognized and interpreted. Once the input address has been parsed, the parsed address can be processed in view of the postal and street network geocoding data, which are themselves organized based on address component elements.
In a conventional international geocoding system it is necessary to have multiple parsing engines. Since different regions and countries have different languages, different formats, and different rules for formulating addresses, it has been necessary to code separate parsing engines for each region and country. For example, see U.S. Patent U.S. Pat. No. 7,039,640 (incorporated by reference herein) which states that “In view of the diversity of address formats in the world, there is no generic address parser. Therefore, a suitable parser has to be created or instantiated for each country or jurisdiction(s) sharing a common addressing format.” (Col. 9, lines 4-8). Writing those separate parsers is time-consuming, redundant, inefficient, and error-prone.