Geocoding is generally thought of as the act, method, or process of programmatically assigning x and y coordinates (usually, but not limited to, latitude and longitude) to records, lists and files containing location information (full addresses, partial addresses, zip codes, census FIPS codes, etc.) for cartographic or any other form of spatial analysis or reference. Currently, geocoding is used to convert a street address or other textually-specific geographic location information into a physical location. Geocoding is currently performed by running ungeocoded information (“raw data”) through proprietary software that performs table lookup, fuzzy logic and address matching against an entire “library” of all known or available addresses (referred to hereinafter as “georeferenced library”) with associated x,y locating coordinates. The raw data that match the records from the georeferenced library are then assigned the same x,y coordinates associated with the matched record in the georeferenced library. A “centroid” is defined as a geographic center of an entire area, region, boundary, etc. for which the specific geographic area covers. Street vectors are defined as address ranges that are assigned to segments of individual streets. Street vectors are used in displays of digitized computer-based street maps, and usually appear as left-side and right-side address ranges. They are also used for geocoding a particular address to a particular street segment based on its point along the line segment. FIG. 1 contains a table showing the address range on both sides of the street for one particular street segment of Elm St.
Geographic information systems (GIS) store, retrieve and display topological information. The topological information is obtained from a topology that is a topographic study of a geographic region. The topographic study may be a map having features of the geographic region including rivers, lakes, etc, as well as bridges and roads.
A geo-referenced library can be compiled from a number of various sources, including US Census address information and US Postal address information, as well as ZIP Code boundaries and other various sources of data containing geographic information and/or location geometry. In the prior art, if a raw data address could be matched exactly to a specific library street address, then an attempt was made to match the raw data address to an ever-decreasing precision geographic hierarchy of point, line or region geography until a predetermined tolerance for an acceptable match was met. Current geocoding technology generally provides for two main types of precision: street level and postal ZIP centroid. Street level precision is the placement of geocoded records at the street address (as shown in record 10 of FIG. 2). Street level precision attempts to geocode all records to the actual street address. In all likelihood, some matches may end up at a less precise location, such as a ZIP centroid (e.g., ZIP+4, ZIP+2, or ZIP). One of the disadvantages of ZIP code matching alone is that current geocoding technology only examines the ZIP code field when matching. If the ZIP codes in the raw data records do not already have ZIP+4 values, then current geocoding technology will only match to the larger area 5-digit ZIP code centroids. Conversely, if only the street level precision is used, current geocoding technology will attempt to return street-level coordinates and will optionally fall back to the slightly less precise ZIP+4 coordinates.
The typical output from a geocoding process (a “match”) is a longitude/latitude coordinate pair specifying a point on the earth's surface. Current geocoding technology is considered to be imprecise, and only works well when the input is a well-formed and existing street address and the desired output is a physical point location. Sub-optimal performance has been the result when one or more of the following elements is involved in the geocoding process: (1) the input element is an incomplete street address; (2) the input postal address is valid, but has a large interpolation error, or is located via a zip code centroid or other imprecise method; (3) the address is ambiguous (i.e., multiple “hits” are returned for the input address); (4) the input element is not a point location, but a set of locations or polygon; (5) the geocoding system has multiple data sets for a single locator type; or (6) the desired result is not a point longitude/latitude location, but a bounding geometry (minimal bounding rectangle—MBR) in which the input element must definitely lie.
Thus, a need remains in the geocoding art for a method that is able to return a more precise result from any of a variety of incomplete input data.