The Internet has made possible access to a wide variety of data and services. Websites entice users to take advantage of free services as a way of exposing the user to advertisements, which in turn, provide revenue to the host vendor. One example includes the free access and use of online mapping services for geographic locations. Such services are used extensively not only for online access but also for paid services in vehicles, for example.
Geocoding is the process of assigning geographic identifiers such as coordinates to map features such as addresses, places, and other structures by way of software called a geocoder. This applies not only to addresses and places, but also to events or data that have a geographic component.
Typically, conventional geocoding systems are based more on data retrieval techniques rather than information retrieval. A data retrieval system strives to retrieve all objects that satisfy clearly defined conditions such as in a regular expression or in a relational algebra expression. Data retrieval deals with data that has a well defined structure and semantics. In contrast, an information retrieval system usually deals with natural language text that is not always well structured and can be semantically ambiguous. To be effective, the information retrieval system must interpret the query to determine the user intent, which requires extraction of semantic and syntactic information, and then use this information to find relevant documents. The notion of relevance is central to information retrieval systems. Accordingly, the goal of the information retrieval system is to retrieve all documents which are relevant to a user query, while retrieving as few non-relevant documents as possible.
The user interface for geocoding usually consists of separate fields of an address (e.g., address line, city, state, postal code, country, etc.). The values in these fields are usually used by a geocoder to perform hierarchical data lookups in order to obtain the potential matches. However, when given a single-line address that has the fields concatenated together, conventional geocoders perform poorly in that parsing of the string into individual fields is inaccurate. Some of the key problems include incorrect and optional usage of separators, various permutations of the individual fields, and missing fields. Parsing generally tends to be code-driven, rather than data-driven based on real-world logs, and follows a single code path to extract the individual fields. These fields are then sent to a geocoding data lookup engine to obtain the matches and return the matches back to the caller. The success of this approach is limited to well-formatted addresses that have all the individual fields, but tends to break down when there is an ambiguity or missing information in the query.