Geocoding involves programmatically assigning x and y coordinates (usually but not limited to, earth coordinates—i.e., latitude and longitude) to records, lists and files containing location information (full addresses, partial addresses, zip codes, census FIPS codes, etc.) for cartographic or any other form of spatial analysis or reference. Geocoding is even more broadly described as “mapping your data” in order to visualize information and explore relationships previously unavailable in strict database or spreadsheet analysis.
A centroid is a geographic center of an entire area, region, boundary, etc. for which the specific geographic area covers.
Street vectors are address segments of individual streets, which may contain attributes such as address ranges. Street vectors are used in displays of digitized computer-based street maps. Range information on street vectors is typically specified on the left and right side of each vector. They are also used for geocoding a particular address to a particular street segment based on its point along the line segment.
Geocoding is currently performed by running non-geocoded (referred to hereafter as “raw data”) information such as a list of customers through proprietary software and/or data, which performs table lookup, fuzzy logic and address matching against an entire “library” of all known or available address points or street vectors (referred to hereafter as a “georeferenced library”) with associated x, y location coordinates. If the raw data matches a point record from the georeferenced library, then the raw data is assigned the same x, y coordinates associated with the matching record from the georeferenced library. If the raw data instead matches a street vector, then the raw data is assigned interpolated x,y coordinates pair based on the x,y coordinates of the high and low address for the matched street vector in the georeferenced library.
The georeferenced library is compiled from a number of varied sources, depending on the territory, including census information, postal address information, street vectors with associated address ranges, postcode centroids and other various sources of data containing geographic information and/or location geometry. If a raw data address cannot be matched exactly to a specific library street address (known as a “street level hit”), then an attempt is made to match the raw data address to an ever decreasing precision geographic hierarchy of point, line or region geography until a predetermined tolerance for an acceptable match is met. The geographic hierarchy to which a raw data record is finally assigned is also known as the “geocoding precision.” Geocoding precision tells how closely the location assigned by the geocoding software matches the true location of the raw data.
FIG. 1 illustrates a street segment called Main Street. The illustrated Main Street segment is for the odd side of Main Street and has an address range of 1 to 99 (odd numbers only) spanning between segment endpoints A and B. The coordinates of endpoint A are (X,Y) while the coordinates of endpoint B are (X1, Y1). Heretofore, interpolation of input addresses in a geocoder was accomplished by considering the available high and low address range data in a georeferenced library for the given street segment and calculating where on that segment an input address from the raw data ought to reside based upon the latitude/longitude pairs of those two endpoints.
For example, as illustrated in FIG. 1, given the Main Street segment, current interpolation methods will assume that addresses exist at points equidistant from each other and that the determination of where an input address from the raw data resides on a given segment is calculated using the coordinates of the segment endpoints A, B (or nodes) from the georeferenced library. Current interpolation will place an input address of 33 Main Street approximately one third (point C) of the way along the segment.
The disadvantage of the prior art methods is that they fail to consider that houses, buildings, etc. are typically not located at regular intervals along a street vector or sometimes do not utilize the full range of possible address numbers assigned by the postal authority for the street vector. As such, these methods are not as accurate as they should be, which is undesirable. Users of such geocoding methods may assign locations to addresses on a street vector that are incorrect when compared to the actual ground truth positions of addresses on that street vector. For example, traditional interpolation can result in clustering addresses in close proximity at one end of a street vector when the actual addresses are distributed along a greater length of the street vector. In FIG. 6A, the pushpins on the image depict the results of using a traditional interpolation technique to geocode addresses 2, 14 and 22 on Bieniek Ave, Adams, Mass., for which the postal authority has assigned the possible addresses of 1 through 99. In reality, the even-numbered addresses on the full length of the street, as indicated by the numbered stars, range only up to 22. Traditional interpolation methods assume the existence of addresses 2 through 98 on the even side of the street and therefore locate 2, 14 and 22 as being clustered on one end of the street, which in this instance is incorrect. Thus, use of the existing geocoding methods can result in errors in analysis and/or logistics where location is a key component. Accordingly, there is a desire and need for more accurate geocoding technique.