The background and the invention are best understood by defining certain terms including: geocoding, centroids, and street vectors/segments.
Geocoding is the act, method or processes of prograrnmatically assigning x and y coordinates (usually but not limited to earth coordinates--i.e., latitude and longitude) to records, lists and files containing location information (full addresses, partial addresses, zip codes, census FIPS codes, etc.) for cartographic or any other form of spatial analysis or reference. Geocoding is even more broadly described as "mapping your data" in order to visualize information and exploring relationships previously unavailable in strict database or spreadsheet analysis.
A centroid is a geographic center of an entire area, region, boundary, etc. for which the specific geographic area covers.
Street vectors are address ranges that are assigned to segments of individual streets. Street vectors are used in displays of digitized computer based street maps. Street vectors usually appear as left and right side address ranges. They are also used for geocoding a particular address to a particular street segment based on its point along the line segment. For example, the table below shows the address range on both sides of the street for one particular street segment of Main St.:
______________________________________ Street FromLeft ToLeft FromRight ToRight ______________________________________ Main St 2500 2536 2501 2549 ______________________________________
Geocoding
Geocoding is currently performed by running ungeocoded (referred to hereafter as "raw data") information such as a list of customers through proprietary software and/or data which performs table lookup, fuzzy logic and address matching against an entire "library" of all known or available addresses (referred to hereafter as "georeferenced library") with associated x,y location coordinates. The raw data that match the records from the georeferenced library are then assigned the same x, y coordinates associated with the matched record in the georeferenced library.
The georeferenced library is compiled from a number of varied sources including US Census address information and US Postal address information, along with Zip Code boundaries and other various sources of data containing geographic information and/or location geometry. If a raw data address cannot be matched exactly to a specific library street address (known as a "street level hit"), then an attempt is made to match the raw data address to an ever decreasing precision geographic hierarchy of point, line or region geography until a predetermined tolerance for an acceptable match is met. The geographic hierarchy to which a raw data record is finally assigned is also known as the "geocoding precision." Geocoding precision tells how closely the location assigned by the geocoding software matches the true location of the raw data. Current geocoding technology generally provides for two main types of precision: Street Level and Postal ZIP Centroid. Street Level precision is the placement of geocoded records at the street address. (See FIG. 1, record no. 1.) Street level precision attempts to geocode all records to the actual street address. In all likelihood, some matches may end up at a less precise location such as a ZIP centroid (ZIP+4, ZIP+2, or ZIP Code) or shape path (the shape of a street as defined by points that make up each segment of the street). A record is assigned or geocoded to the centroid of the shape path (S4--not listed in FIG. 1 as this is a rare occurrence) if the matching street address does not contain address ranges.
ZIP centroid precision places geocoded records at a postal record ZIP Code centroid. ZIP centroid precision matches a raw data record to the most precise ZIP Code it finds. The most precise postal match is one made to a ZIP+4 centroid. See FIG. 1, record no. 2. ZIP+4 is nearly as precise as a street level hit (street address). If a ZIP+4 centroid cannot be matched or does not exist, a match may then fall back to a ZIP+2 centroid (record no. 3) if available. The least accurate postal match is one made to a 5 digit ZIP centroid (record nos. 4, 5, 6.) If no street level or postal match can be found in the georeferenced library, then a record remains ungeocoded (record nos. 7, 8, 9, 10). This can be the result of a lack of information in the georeferenced library (new building/development, address overlooked/not included, etc.) or a lack of information (missing address information, etc.) in the raw data records which are being geocoded.
One of the disadvantages of ZIP Code matching alone (without street address) is that current geocoding technology only examines the ZIP Code field when matching. If the ZIP Codes in the raw data records do not already have ZIP+4 values, then current geocoding technology will only match to the much larger area 5-digit ZIP Code centroids. Conversely, if you use Street Level precision, current geocoding technology will attempt to return street-level coordinates and will optionally fallback to the slightly less precise ZIP +4 coordinates. If the georeferenced library does not contain a full 9 digit ZIP Code (ZIP +4) x,y location for the raw data address, current geocoding technology will fallback on the less precise 5 digit ZIP coordinates.
As described above, another disadvantage of ZIP code matching is that ZIP+4 centroids may not exist at all and the only option is a fallback to the much larger area 5-digit ZIP Code centroid. An examination of current (January, 1998) ZIP+4 centroid availability bares out the problem of relying solely on ZIP+4 centroid placement when a specific street level address can not be found for a raw data record. FIG. 8 shows the breakdown of the ZIP+4 file for New York State. Fully two thirds of the centroids found in the file are not actually ZIP+4 centroids at all, but merely the less precise 5 digit ZIP or ZIP+2 centroids.
The geocoding process assigns ever larger geographic aggregations (or less precision) to raw data records until most if not all the raw data records have been geocoded. As a result, some form of location coordinate or spatial attribute is assigned to as many raw data records as possible. Those records which cannot be geocoded due to missing address information or unknown address information are separated from the rest of the records and classified as "ungeocoded" records.
Current technology does not allow for geocoding in most geographic locations throughout the world. Other than North America and, in particular, the United States, very little if any geocoding technology exists due to:
a lack of technological resources PA1 no infrastructure for systematic and current upkeep of street addresses PA1 absence of government sponsored programs or agencies for inclusion of meaningful amount of total existing addresses PA1 addresses not available in digital form PA1 such information withheld from public usage as an eminent domain of government alone.
Current geocoding technology cannot be implemented under the above conditions. Those conditions hinder the proliferation of geocoding technologies to many nations which would benefit from such technologies.
Street Vectors
In the United States, the U.S. Census Bureau assigns street vectors. They are assigned during the decennial census by enumerators or "street canvassers" who do the actual census taking. Those address ranges are then compiled, digitized and otherwise made into street segments that contain address ranges or street vectors as described above. A compilation of those computer mapped streets for the entire U.S. is then made available for purchase through the Topologically Integrated Geographic Encoding and Referencing (TIGER) digital database.
Many companies resell TIGER information in a more specific, user friendly and/or proprietary format. The resold information often adds value to the original TIGER data by using various proprietary algorithms and methods. Some resellers create new street segments not included by the Census Bureau at the time of enumeration. Many use digitizing and data transfer to propagate street segments.