1. Field of the Invention
The present invention relates to spatially-enabled computer databases, and deals more particularly with techniques for programmatically deriving street geometry from address data supplied in textual format.
2. Description of the Related Art
Geographic information systems are known in the art, and store geographic or cartographic (i.e. map-oriented) data. Systems are also known in the art for using relational databases to process (e.g. store and access) this type of geographic data. When a relational database is adapted for use with geographic information system (xe2x80x9cGISxe2x80x9d) data, the database is often referred to as xe2x80x9cspatially-enabledxe2x80x9d.
Geographic data pertains to physical locations, and when using 2 dimensions, is typically expressed in terms of latitude and longitude. The latitude and longitude values for a particular location are given relative to fixed points of reference, using a coordinate system in which a latitude value represents an offset from the equator and a longitude value represents an offset from the prime meridian.
Geographic data may describe the physical location or area of a place or thing, or even the location of a person. When geographic data is stored in a spatially-enabled database, it is stored using a geometric model in which locations/areas are expressed in terms of geometric shapes or objects. The geometric data stored according to this model may also be referred to as xe2x80x9cspatial dataxe2x80x9d. In addition to locations or areas of geographic objects, spatial data may also represent relationships among objects, as well as measurements or distances pertaining to objects. As an example of relationships among objects, spatial data may be used to determine whether a geometric shape corresponding to the location of a particular bridge intersects a geometric shape corresponding to the location of a river (thus determining whether the bridge crosses the river). As an example of using spatial data for measurements or distances, the length of a road passing through a particular county could be determined using the geometric object representing the road and a geometric object which specifies the boundaries of the county.
Spatial data values are expressed in terms of xe2x80x9cgeometryxe2x80x9d or xe2x80x9cgeometricxe2x80x9d data types. Thus, the location of a landmark might be expressed as a point having (x,y) coordinates, and the perimeter of a lake might be defined using a polygon. Typical spatially-enabled database systems support a set of basic geometry data types and a set of more complex geometry data types, where the basic types comprise points, line strings, and polygons, and the complex types comprise collections of points, collections of line strings, and collections of polygons.
A common geometric model used by spatially-enabled database systems is shown in FIG. 1. As shown therein, the model is structured as a hierarchy or tree 100 having geometry 105 as its root, and having a number of subclasses. Point 110, linestring 120, and polygon 130 represent the basic geometry data types. In this model 100, linestring 120 is a subclass of curve 115, and polygon 130 is a subclass of surface 125. Geometry collection class 135 is the root of a subtree representing the more complex geometric data types, and each subclass thereof is a homogeneous collection. Multipolygon 145, multistring 155, and multipoint 160 represent the collections of polygons, line strings, and points, respectively. Multipolygon 145 is a subclass of multisurface 140 in this model, and multistring 155 is a subclass of multicurve 150. Only the classes which are leaves of this tree 100 are instantiable in typical spatially-enabled database systems; the other nodes correspond to abstract classes. (Each of these entities is an actual data type.)
Referring now to the basic data types in particular, geometric data according to the model 100 of FIG. 1 may be expressed in terms of a single point having (x,y) coordinates, or may be described as a line string or a polygon. A line string may be considered as one or more line segments which are joined together, and is defined using an ordered collection of (x,y) coordinates (i.e. points) that correspond to the endpoints of the connected segments. A polygon is defined using an ordered collection of points at which a plurality of line segments end, where those line segments join to form a boundary of an area.
Many different examples may be imagined where points, line strings, and polygons can be used for describing locations or areas. A point might represent the location of a landmark such as a house or a building, or the intersection of two streets. A line string might be used to describe a street, or the path of a river or power line, or perhaps a set of driving directions from one location to another. A polygon might be used to describe the shape of a state or city, a voting district, a lake, or any parcel of land or body of water.
Once spatial information has been stored in a database, the database can be queried to obtain many different types of information, such as the distance between two cities, whether a national park is wholly within a particular state, and so forth.
Early geographic information systems relied on proprietary data formats. A widely popular example is the xe2x80x9c.shpxe2x80x9d shape format. These shape files contain binary data that may represent points, line strings, or polygons relating to geographic locations or areas. Another commonly-used proprietary data format is known as xe2x80x9c.EDGxe2x80x9d. Files using EDG format contain binary data that provides a mapping between an address and its 2-dimensional geographic location. Efforts have been made in recent years to define open, standardized data formats for GIS data, in order to facilitate exchange of data between systems. This work is characterized by two data formats known as xe2x80x9cwell known textxe2x80x9d and xe2x80x9cwell known binaryxe2x80x9d, or simply xe2x80x9cWKTxe2x80x9d and xe2x80x9cWKBxe2x80x9d. The Open GIS Consortium, Inc. (xe2x80x9cOGCxe2x80x9d) is an industry consortium which promulgates standardized specifications including these data formats. The data formats are termed xe2x80x9cwell knownxe2x80x9d because they are standardized and therefore non-proprietary. Typical spatially-enabled database systems support one or more of these four data formats.
As one example of a spatially-enabled database, a feature known as xe2x80x9cSpatial Extenderxe2x80x9d can be added to IBM""s DB2(copyright) relational database product to provide GIS support. Spatial Extender provides support for the geometric data types shown in FIG. 1, and provides a number of built-in functions for operating on those data types. When using Spatial Extender, spatial data can be stored in columns of spatially-enabled database tables by importing the data or deriving it. The import process uses one of the WKT, WKB, or xe2x80x9c.shpxe2x80x9d shape formats described above as source data, and processes that data using built-in functions to convert it to geometric data. For example, WKT format data may be imported using xe2x80x9cgeometryFromTextxe2x80x9d functions; similar functions are provided for WKB format data (xe2x80x9cgeometryFromWKBxe2x80x9d) and xe2x80x9c.shpxe2x80x9d shape data (xe2x80x9cgeometryFromShapexe2x80x9d). Spatial data may be derived either by operating on existing geometric data (for example, by defining a new polygon as a function of an existing polygon) or by using a process known as xe2x80x9cgeocodingxe2x80x9d. A geocoder is provided with Spatial Extender that takes as input an address in the United States, and derives a geometric point representation. Other geocoders can be substituted to provide other types of conversions.
Refer to xe2x80x9cIBM(copyright) DB2(copyright) Spatial Extender User""s Guide and Referencexe2x80x9d, Version 7.2, published by IBM in July 2001 as IBM publication SC27-0701-01, for more information on Spatial Extender. This User""s Guide is hereby incorporated herein as if set forth fully, and is hereinafter referred to as the xe2x80x9cSpatial Extender User""s Guidexe2x80x9d. (xe2x80x9cIBMxe2x80x9d and xe2x80x9cDB2xe2x80x9d are registered trademarks of IBM.)
Another example of a spatially-enabled database is the IBM Informix(copyright) Spatial DataBlade(copyright) product. This database is described in xe2x80x9cSDE Version 3.0.2 for Informix Dynamic Server, Spatial DataBlade Reference Manualxe2x80x9d, published on the Internet at location http://www.esri.com/software/sde/pdfs/datablade.pdf. Spatial DataBlade also supports the geometric types shown in FIG. 1, and the WKT, WKB, and xe2x80x9c.shpxe2x80x9d shape formats. This Reference Manual is referred to hereinafter as the xe2x80x9cSpatial DataBlade(copyright) Reference Manualxe2x80x9d. (xe2x80x9cInformixxe2x80x9d and xe2x80x9cDataBladexe2x80x9d are registered trademarks of IBM.)
While WKT is an open, interchangeable data format, it may be considered as a relatively xe2x80x9cartificialxe2x80x9d or xe2x80x9ccontrivedxe2x80x9d format for source data That is, all geometric data that is expressed in WKT format must be specified using particular syntax conventions. To represent the point having an x-coordinate of 12 and y-coordinate of 25, commonly denoted as (12,25), for example, the following WKT syntax is used:
xe2x80x98point (1225)xe2x80x99
Extensions have been defined to WKT and WKB formats for supporting 3-dimensional dataxe2x80x94that is, allowing points to be expressed with a z-coordinate as well as x- and y-coordinates. (An extension is also defined for a fourth dimension, whereby measurement information can be added to a data value.) To express a 3-dimensional point in WKT format, a syntax that differs slightly from the 2-dimensional syntax is used. Suppose this 3-dimensional point has coordinates (12,25,55). The WKT representation of this point is then:
xe2x80x98point z (122555)xe2x80x99
The syntax for line strings and polygons is similar to that used for points, yet is different in some respects. Given a square polygon having vertices at (0,0), (1,0), (1,1), and (0,1), the WKT representation is:
xe2x80x98polygon ((00, 10, 11, 01, 00))xe2x80x99
A detailed discussion of the WKT syntax, including syntax examples for each possible permutation of geometry type, may be found in xe2x80x9cAppendix C, The well-known text representation for OGIS geometryxe2x80x9d, of the Spatial DataBlade(copyright) Reference Manual.
As will be readily apparent, this type of textual representation of geometric data does not naturally occur in textual documents; instead, geometric data must be specially adapted for, or converted to, this type of textual representation.
The present invention defines advantageous techniques whereby textual information can be used as input to populate a spatially-enabled database without requiring the textual information to be provided in (or converted to) WKT format.
An object of the present invention is to provide improved techniques for populating spatially-enabled databases.
Another object of the present invention is to provide techniques for populating spatially-enabled databases with street geometry such that retrievals can be performed therefrom without requiring such retrievals to access WKT, WKB, or xe2x80x9c.shpxe2x80x9d shape file formats or EDG street files.
A further object of the present invention is to define techniques for creating street geometry data from readily-available textual address information.
Still another object of the present invention is to provide improved ways for storing street geometry.
Other objects and advantages of the present invention will be set forth in part in the description and in the drawings which follow and, in part, will be obvious from the description or may be learned by practice of the invention.
To achieve the foregoing objects, and in accordance with the purpose of the invention as broadly described herein, the present invention provides methods, systems, and computer program products for programmatically deriving street geometry data from address data. In a preferred embodiment, this technique comprises: obtaining a street address; determining a geographic location of the obtained street address; and storing the obtained street address and the geographic location in a database for subsequent retrieval without using binary files or proprietary formats. The geographic location preferably comprises latitude and longitude values corresponding to the obtained street address. The street address preferably further comprises a street name and number, city, state, and zip code.
In another preferred embodiment, this technique comprises: obtaining a plurality of street addresses; determining a geographic location of each of the obtained street addresses; and storing each of the obtained street addresses and the geographic locations in a relational database system, wherein the geographic locations are stored using geometric data types supported by the relational database system. The obtained street addresses are preferably provided in textual format. The storing operation preferably further comprises creating records in a street table, each of the records comprising at least a street name and number. In this case, each of the records may further comprise a starting point for a street represented by the street name and/or a geometric data type describing a path taken by a street represented by the street name. The records may also comprise a geometric data type describing a bounding box corresponding to the path taken by the street. The storing operation may also further comprise creating records in a city table, state table, and zip code table or international equivalents thereof.
The storing operation preferably further comprises creating records in an address table, each of the records comprising a street address identification and the geographic location corresponding to that street address identification. The street address identification preferably further comprises a street name and number, and each of the records preferably further comprises an identification of a city, state, and zip code associated with that street address identification. (Alternatively the identification of the city, state, and zip code is replaced by an international equivalent thereof.)
The present invention may also be used advantageously in methods of doing business. For example, the street geometry data created through use of the present invention can be used to support a wide variety of business services, such as targeted marketing services that identify mailing addresses within particular geographic areas or mailing addresses within proximity of particular locations.