Systems have been developed that employ two geocoding data sets to attempt to provide the correct latitude and longitude as well as street address for incomplete or inaccurate input data for a specific location. These systems are employed where the geographic location of an address is needed for example to determine if the address is in a flood plain for insurance rating purposes or for directions to the address or for mapping of the address into a locale and, at times, for address hygiene purposes.
The geocoding data sets are processed by a geocoding engine. This is a specialized matching engine that utilizes a textual representation of an address as input. The engine matches the address against a data set of geographic data and uses algorithms to determine the location of the input address. The engine returns the location as a coordinate (longitude-latitude pair) referred to as a geocode and, depending on the system, may also return a more complete and accurate address based on an address hygiene function.
Geocoding data sets used for the above purposes include point level or parcel level geographic data sets and centerline geographic data sets. Point level or parcel level data sets (hereinafter referred to as point level data sets) are data sets where a single latitude and longitude is provided for a specific address. Centerline data sets are data sets where a centerline is provided, such as for a street, and interpolation is employed to relate the centerline to a specific address to establish a single latitude and longitude, from a range of latitude and longitude, for the address.
A street centerline data set usually contains coordinates that describe the shape of each street and the range of house numbers found on each side of the street. The geocoding engine may compute the geocode by first interpolating where the input house number exists within the street address range. The geocoding engine then applies this interpolated percentage to the street centerline coordinates to calculate the location. Finally, the engine offsets this location from the centerline to give an approximate structure location for the input address. Data sets are now also available that consist of point locations for addresses. These point level data sets result in higher quality geocodes than those requiring the interpolation technique. However, these point level data sets often do not contain every address and are therefore incomplete.
The sources of data for centerline and point level address matching have historically come from postal services or from digital map vendors, including census bureaus. The centerline data sets for address matching are largely complete due to their maturity and because they contain ranges of addresses rather than individual addresses. Newer point level data sets contain only one address per record and, as noted above, may not contain all addresses. Point level data sets are generally provided by the same sources as centerline data sources.
A centerline data set generally contains all known addresses within a locale. Because the data set is considered to be complete, the software can determine the best match based on the available records in the data. For example, 1 Elmcroft St. in Stamford, Conn., can be safely matched to 1 Elmcroft Rd. in Stamford, Conn., because the only viable address candidate for Elmcroft St. is Elmcroft Rd. The completeness of the reference data set provides a high level of confidence that Elmcroft St does not exist. Conversely, if the input address is 838 Mesa, Palo Alto, Calif., a match cannot be made if there are two viable candidates, Mesa Ct and Mesa Ave. Neither address candidate can be determined to be a better match.
As noted above, point level geocoding data sets are currently generally incomplete. If only Mesa Ct is present in the point level data set, the software could erroneously match to that record, resulting in a false positive match. Therefore, if the software uses only the point level data, there is a risk of matching to the wrong record (a false positive match). This type of match generally has a lower level of confidence in the accuracy and usually should not be made with the point level data set alone.
Prior efforts to combine centerline data sets and point level data sets have been implemented to attempt to provide an increased level of confidence of having an accurate address match than is provided by either data set alone. Examples of such systems are previous versions of the Centrus GeoStan product and its competitors. In such systems the candidate data records are scored by separate geocoding engines or separate geocoding operations in each data set. Decisions as to the relative likelihood of a correct match for the input data are made without regard to results from the other data set. This results in an increased number of false positive results since the point level data is assumed to be complete. Alternatively, this may result in fewer matches to the point level data than might otherwise be possible if the match logic employed for the point level data is too restrictive.